skip to navigation
skip to content

Planet Python

Last update: July 26, 2017 09:47 PM

July 26, 2017


PyPy Development

Binary wheels for PyPy

Hi,

this is a short blog post, just to announce the existence of this Github repository, which contains binary PyPy wheels for some selected packages. The availability of binary wheels means that you can install the packages much more quickly, without having to wait for compilation.


At the moment of writing, these packages are available:


For now, we provide only wheels built on Ubuntu, compiled for PyPy 5.8.
In particular, it is worth noting that they are not manylinux1 wheels, which means they could not work on other Linux distributions. For more information, see the explanation in the README of the above repo.

Moreover, the existence of the wheels does not guarantee that they work correctly 100% of the time. they still depend on cpyext, our C-API emulation layer, which is still work-in-progress, although it has become better and better during the last months. Again, the wheels are there only to save compilation time.

To install a package from the wheel repository, you can invoke pip like this:

$ pip install --extra-index https://antocuni.github.io/pypy-wheels/ubuntu numpy

Happy installing!

July 26, 2017 04:45 PM


Tarek Ziade

Python Microservices Development

My new book, Python Microservices Development is out!

The last time I wrote a book, I was pretty sure I would not write a new one -- or at least not about Python. Writing a book is a lot of work.

The hard part is mostly about not quitting. After a day of work and taking care of the kids, sitting down again at my desk for a couple of hours was just hard, in particular since I do other stuff like running. I stole time from my wife.

The topic, "microservices" was also not easy to come around. When I was first approached by Packt to write it, I said no because I could not see any value of writing yet another book on that trendy (if not buzzwordy) topic.

But the project grew on me. I realized that in the past seven years, I have been working on services at Mozilla, we did move from a monolithic model to a microservices model. It happened because we moved most of our services to a cloud vendor, and when you do this, your application consumes a lot of services, and you end up splitting your applications into smaller pieces.

While picking Python 3 was a given, I hesitated a lot about writing the book using an asynchronous framework. I ended up sticking with a synchronous framework (Flask). Synchronous programming still seems to be mainstream in Python land. If we do a 2nd edition in a couple of years, I would probably use aiohttp :)

The other challenge is English. It is not my native language, and while I used Grammarly and I was helped a lot by Packt (they improved their editing process a lot since my first book there) it's probably something you will notice if you read it.

Technically speaking, I think I have done a good job at explaining how I think microservices should be developed. It should be useful for people that are wondering how to build their applications. Although I wished I had more time to finish polishing some of the code that goes with the book, thankfully that's on GitHub, so I still have a bit of time to finish that.

Kudos to Wil Khane-Green, my technical reviewer, who did a fantastic work. The book content is much better, thanks to him.

If you buy the book, let me know what you think, and do not hesitate to interact with me on Github or by email.

July 26, 2017 04:22 PM


DataCamp

New Python Course: Data Types for Data Science

Hello Python users! New course launching today: Data Types for Data Science by Jason Myers!

Have you got your basic Python programming chops down for Data Science but are yearning for more? Then this is the course for you. Herein, you'll consolidate and practice your knowledge of lists, dictionaries, tuples, sets, and date times. You'll see their relevance in working with lots of real data and how to leverage several of them in concert to solve multistep problems, including an extended case study using Chicago metropolitan area transit data. You'll also learn how to use many of the objects in the Python Collections module, which will allow you to store and manipulate your data for a variety of Data Scientific purposes. After taking this course, you'll be ready to tackle many Data Science challenges Pythonically.

Take me to chapter 1!

Data Types for Data Science features interactive exercises that combine high-quality video, in-browser coding, and gamification for an engaging learning experience that will make you a master in data science with Python!

What you'll learn: 

Chapter 1: Fundamental data types

This chapter will introduce you to the fundamental Python data types - lists, sets, and tuples. These data containers are critical as they provide the basis for storing and looping over ordered data. To make things interesting, you'll apply what you learn about these types to answer questions about the New York Baby Names dataset!

Chapter 2: Dictionaries - the root of Python

At the root of all things Python is a dictionary. Herein, you'll learn how to use them to safely handle data that can be viewed in a variety of ways to answer even more questions about the New York Baby Names dataset. You'll explore how to loop through data in a dictionary, access nested data, add new data, and come to appreciate all of the wonderful capabilities of Python dictionaries.

Chapter 3: Meet the collections module

The collections module is part of Python's standard library and holds some more advanced data containers. You'll learn how to use the Counter, defaultdict, OrderedDict and namedtuple in the context of answering questions about the Chicago transit dataset.

Chapter 4: Handling Dates and Times

Handling times can seem daunting at times, but here, you'll dig in and learn how to create datetime objects, print them, look to the past and to the future. Additionally, you'll learn about some third party modules that can make all of this easier. You'll continue to use the Chicago Transit dataset to answer questions about transit times.

Chapter 5: Answering Data Science Questions

Finally, time for a case study to reinforce all of your learning so far! You'll use all the containers and data types you've learned about to answer several real world questions about a dataset containing information about crime in Chicago.

Learn all there is to know about Data Types for Data Science today!

July 26, 2017 02:32 PM


The Three of Wands

attrs I: The Basics

This is the first article in my series on the inner workings of attrs.

Attrs is a Python library for defining classes a different (much better) way. The docs can be found at attrs.readthedocs.org and are pretty good; they explain how and why you should use attrs. And as a Python developer in 2017 you should be using attrs.

This attrs series is about how attrs works under the hood, so go read the docs and use it somewhere first. It'll make following along much easier. The source code is available on GitHub; at this time it's not a particularly large codebase at around 900 lines of non-test code. (Django, as an example, currently has around 76000 lines of Python.)

Here's the simplest useful class the attrs way:

@attr.s
class C:  
    a = attr.ib()

(I'm omitting boilerplate like imports and using Python 3.6+.)

This will get you a class with a single attribute and the most common boilerplate (__init__, __repr__, ...) generated and ready to be used. But what's actually happening here?

Let's take a look at this class without the attr.s decorator applied. (Leaving it out is an error and won't get you a working class, we're doing it now to take a look under the hood.)

class C:  
    a = attr.ib()

So this is just a class with a single class (i.e. not instance) attribute, assigned the value of whatever the attr.ib() function returns. attr.ib is just a reference to the attr._make.attr function, which is a fairly thin wrapper around the attr._make._CountingAttr class.

This is a private class (as the leading underscore suggests) that holds the intermediate attribute state until the attr.s class decorator comes along and does something with it.

>>> C.a
_CountingAttr(counter=8, _default=NOTHING, repr=True, cmp=True, hash=None, init=True, metadata={})  

The counter is a global variable that gets incremented and assigned to _CountingAttr instances when they're created. It's there so you can count on the consistent ordering of attributes:

@attr.s
class D:  
    a = attr.ib()
    b = attr.ib()
    c = attr.ib()
    d = attr.ib()

>>> [a.name for a in attr.fields(D)]
['a', 'b', 'c', 'd']  # Note the ordering.

Attrs has relatively recently added a new way of defining attribute defaults:

@attr.s
class E:  
    a = attr.ib()

    @a.default
    def a_default(self):
        return 1

As you might guess by now, default is just a _CountingAttr method that updates its internal state. (It's also the reason the field on CountingAttr instances is called _default and not default.)

attr.s is a class decorator that gathers up these _CountingAttrs and converts them into attr.Attributes, which are public and immutable, before generating all the other methods. The Attributes get put into a tuple at C.__attrs_attrs__, and this tuple is what you get when you call attr.fields(C). If you want to inspect an attribute, fetch it using attr.fields(C).a and not C.a. C.a is deprecated and scheduled to be removed soon, and doesn't work on slot classes anyway.

Now, armed with this knowledge, you can customize your attributes before they get transformed and the other boilerplate methods get generated.

You'll also need some courage, since _CountingAttrs are a private implementation detail and might work differently in the next release of attrs. Attributes are safe to use and follow the usual deprecation period; ideally you should apply your customizations after the application of attr.s. I've chosen an example that's much easier to implement before attr.s.

As an exercise, let's code up a class decorator that will set all your attribute defaults to None if no other default was set (no default set is indicated by the _default field having the sentinel value attr.NOTHING). We just need to iterate over all _CountingAttrs and change their _default fields.

from attr import NOTHING  
from attr._make import _CountingAttr

def add_defaults(cl):  
    for obj in cl.__dict__.values():
        if not isinstance(obj, _CountingAttr) or obj._default is not NOTHING:
            continue
        obj._default = None
    return cl

Example usage:

@attr.s
@add_defaults
class C:  
    a = attr.ib(default=5)
    b = attr.ib()

>>> C()
C(a=5, b=None)  

July 26, 2017 12:41 PM


Catalin George Festila

The gtts python module.

This python module named gtts will create an mp3 file from spoken text via the Google TTS (Text-to-Speech) API.
The installation of the gtts python module under Windows 10.

C:\Python27\Scripts>pip install gtts
Collecting gtts
Downloading gTTS-1.2.0.tar.gz
Requirement already satisfied: six in c:\python27\lib\site-packages (from gtts)
Requirement already satisfied: requests in c:\python27\lib\site-packages (from gtts)
Collecting gtts_token (from gtts)
Downloading gTTS-token-1.1.1.zip
Requirement already satisfied: chardet3 .1.0="">=3.0.2 in c:\python27\lib\site-packages (from requests->gtts)
Requirement already satisfied: certifi>=2017.4.17 in c:\python27\lib\site-packages (from requests->gtts)
Requirement already satisfied: idna2 .6="">=2.5 in c:\python27\lib\site-packages (from requests->gtts)
Collecting urllib31 .22="">=1.21.1 (from requests->gtts)
Using cached urllib3-1.21.1-py2.py3-none-any.whl
Installing collected packages: gtts-token, gtts, urllib3
Running setup.py install for gtts-token ... done
Running setup.py install for gtts ... done
Found existing installation: urllib3 1.22
Uninstalling urllib3-1.22:
Successfully uninstalled urllib3-1.22
Successfully installed gtts-1.2.0 gtts-token-1.1.1 urllib3-1.21.1
Let's see a basic example:
from gtts import gTTS
import os
import pygame.mixer
from time import sleep

user_text=input("Type your text: ")

translate=gTTS(text=user_text ,lang='en')
translate.save('output.wav')

pygame.mixer.init()
path_name=os.path.realpath('output.wav')
real_path=path_name.replace('\\','\\\\')
pygame.mixer.music.load(open(real_path,"rb"))
pygame.mixer.music.play()
while pygame.mixer.music.get_busy():
sleep(1)
The text will be take by input into user_text variable.
You need to type the text into quotes also you will got a error.
The result will be one audio file named output.wav and play it by pygame python module.
This use the default voices for all languages. I don't find a way to change this voices with python.

July 26, 2017 11:34 AM


PyCharm

PyCharm 2017.2 Out Now: Docker Compose on Windows, SSH Agent and more

PyCharm 2017.2 is out now! Get it today for Docker Compose support on Windows, SSH Agent, Azure Databases, and Amazon Redshift support.


Get it from our website

Get PyCharm 2017.2 now from our website!

Please let us know what you think about PyCharm! You can reach us on Twitter, Facebook, and by leaving a comment on the blog.

PyCharm Team
-The Drive to Develop

July 26, 2017 10:11 AM

July 25, 2017


Stefan Behnel

What's new in Cython 0.26?

Cython 0.26 has finally been released and it comes with some big and several smaller new features, contributed by quite a number of non-core developers.

Probably the biggest addition, definitely codewise, is support for Pythran as a backend for NumPy array expressions, contributed by Adrien Guinet. Pythran understands many usage patterns for NumPy, including array expressions and some methods, and can directly be used from within Cython to compile NumPy using code by setting the directive np_pythran=True. Thus, if Pythran is available at compile time, users can avoid writing manual loops and instead often just use NumPy in the same way as they would from Python. Note that this does not currently work generically with Cython's memoryviews, so you need to declare the specific numpy.ndarray[] types in order to benefit from the translation. Also, this requires the C++ mode in Cython as Pythran generates C++ code.

The other major new feature in this release, and likely with an even wider impact, is pickling support for cdef classes (a.k.a. extension types). This is enabled by default for all classes with Python compatible attribute types, which explicitly excludes pointers and unions. Classes with struct type attributes are also excluded for practical reasons (such as a high code overhead), but can be enabled to support pickling with the class decorator @cython.auto_pickle(True). This was a long-standing feature for which users previously had to implement the pickle protocol themselves. Since Cython has all information about the extension type and its attributes, however, there was no technical reason why it can't also generate the support for pickling them, and it now does.

As always, there are several new optimisations including speed-ups for abs(complex), comparing strings and dispatching to specialised function implementations based on fused types arguments. Particularly interesting might be the faster GIL re-entry with the directive fast_gil=True. It tries to remember the current GIL lock state in the fast thread local storage and avoids costly calls into the thread and GIL handling APIs if possible, even when calling across multiple Cython modules.

A slightly controversial change now hides the C code lines from tracebacks by default. Cython exceptions used to show the failing line number of the generated C code in addition to the Cython module code line, now only the latter is shown, like in Python code. On the one hand, this removes clutter that is irrelevant for most users. On the other hand, it hides information that could help developers debug failures from bug reports. For debugging purposes, the re-inclusion of C code lines can now be enabled with a runtime setting as follows:

import cython_runtime
cython_runtime.cline_in_traceback=True

July 25, 2017 02:33 PM

What's new in Cython 0.25?

Cython 0.25 has been released in October 2016, so here's a quick writeup of the most relevant new features in this release.

My personal favourites are the call optimisations. Victor Stinner has done a great job throughout the year to optimise and benchmark different parts of CPython, and one of the things he came up with was a faster way to process function calls internally. For this, he added a new calling convention, METH_FASTCALL, which avoids tuple creation by passing positional arguments as a C array. I've added support for this to Cython and also ported a part of the CPython implementation to speed up calls to Python functions and PyCFunction functions from Cython code.

Further optimisations speed up f-string formatting, cython.inline() and some Python long integer operations (now also in Py2.7 and not only Py3).

The next big feature, one that has been on our list essentially forever, was finally added in 0.25. If you declare a special attribute __dict__ on a cdef class (a.k.a. extension type), if will have an instance dict that allows setting arbitrary attributes on the objects. Otherwise, extension types are limited to the declared attributes and trying to access undeclared ones would result in an AttributeError.

The C++ integration has received several minor improvements, including support for calling subclass methods from C++ classes implemented in Cython code, iterating over std::string with Cython's for-loop, typedef members in class declarations, or the typeid operator.

And a final little goodie is redundant declarations for pi and e in libc.math, which makes from libc cimport math pretty much a drop-in replacement for Python's import math.

July 25, 2017 02:33 PM


PyCharm

Interview: Paul Craven on Python Gaming and Teaching

Writing games in Python is fun, so how about using it to teach computer programming? Paul Craven is both a professor and the creator of Arcade, a 2d game library for Python. He’s doing a webinar with us next week, so we talked to him about teaching Python, using Python 3 type hints, why another Python game library, and how IDEs fit into teaching.

paul_craven_splash

Thanks a bunch for doing the webinar next week. First, can you tell us a little bit about yourself?

I worked in the IT industry for 15 years before switching to teaching at Simpson College, a small 4-year college in Iowa. My main interest has been getting first-time programmers to realize that programming can be fun. That moment when a student cheers out loud because they finally figured out how to get sprites to move correctly on the screen? It is a beautiful thing to see.

You teach programming, and you created the Arcade game library to help. Before talking about Arcade, can you explain the motivation behind having a framework that you can teach?

Teaching is like engineering. Each semester you work to improve how you teach students. I had been using the Pygame library. But I wanted a library that I could improve on based on what I saw from students. For example:

Function and parameter names that students intuitively understand. Each year I had to teach them “set_mode” opens a window. Why not just have the function called “open_window”?
Support for functions students which ask for. In Pygame drawing ellipses with thick borders always had a moire pattern because of a bug in the library. And you can’t tilt ellipses when you draw them. Every year I have a student that wants to draw a football. And each year they were frustrated that it looked awful. I wanted a library where it just worked.
Students would download a graphic for their sprite. But it would be too large and there was no easy way to scale the image. That always resulted in hours of wasted time explaining how to do the scaling. What if the library just supported scaling?

After a while I collected such a long list of things like that, I decided to create a game library that where I wouldn’t have to teach “around” these issues.

Beyond using it for teaching, can you talk a bit about Arcade? What is it, how is it different, who might want to use it?

Arcade is great for sprite-based 2D games. It is simple and easy to get started with the library. There is no need to learn a complex framework. If you’ve wanted to create a game for fun, but Kivy, Unity, Unreal would just take more time to learn than what you’ve got, Arcade is a better choice. If you want to quickly create a scientific visualization without a lot of overhead code, Arcade can help you there too.

Arcade uses OpenGL and Pyglet. With OpenGL acceleration, it can draw a lot of sprites fast.

I use Arcade for my PyCharm tutorials and Arcade’s Python 3.6 type hinting is a big reason. Can you talk about your experience, as a library author and teacher, with type hinting?

New programmers often get confused when calling functions and methods. What data does it expect? And when the program doesn’t run, the students aren’t yet experts in reading stack traces. So they are stuck with a 500 line program that doesn’t work and they don’t know why. Frustrating.

Type hinting can sometimes tell students that they are passing unexpected data to the function. It does this before they run the program. Before they’ve even moved to writing the next line of code. It seems trivial, but it’s not. I found students able to create better, more complex programs because PyCharm and type hinting kept them from that error and allowed them to move on.

You also use PyCharm CE with your students. What’s been your experience having beginners start with an IDE?

I’ve taught students with an IDE and without an IDE. The biggest advantage is how the IDE can help students catch errors early. PyCharm’s built-in PEP-8 checking is huge. Also, built in spell-checking! Imagine trying to read program comments written without a spell-checker from today’s student. Students come up with some interesting ways to spell words.

July 25, 2017 09:38 AM


Python Insider

Python 3.5.4rc1 and Python 3.4.7rc1 are now available

Python 3.5.4rc1 and Python 3.4.7rc1 are now available for download.

You can download Python 3.5.4rc1 here, and you can download Python 3.4.7rc1 here.

July 25, 2017 04:40 AM


Peter Bengtsson

Find static files defined in django-pipeline but not found

If you're reading this you're probably familiar with how, in django-pipeline, you define bundles of static files to be combined and served. If you're not familiar with django-pipeline it's unlike this'll be of much help.

The Challenge (aka. the pitfall)

So you specify bundles by creating things in your settings.py something like this:

PIPELINE = {
    'STYLESHEETS': {
        'colors': {
            'source_filenames': (
              'css/core.css',
              'css/colors/*.css',
              'css/layers.css'
            ),
            'output_filename': 'css/colors.css',
            'extra_context': {
                'media': 'screen,projection',
            },
        },
    },
    'JAVASCRIPT': {
        'stats': {
            'source_filenames': (
              'js/jquery.js',
              'js/d3.js',
              'js/collections/*.js',
              'js/aplication.js',
            ),
            'output_filename': 'js/stats.js',
        }
    }
}

You do a bit more configuration and now, when you run ./manage.py collectstatic --noinput Django and django-pipeline will gather up all static files from all Django apps installed, then start post processing then and doing things like concatenating them into one file and doing stuff like minification etc.

The problem is, if you look at the example snippet above, there's a typo. Instead of js/application.js it's accidentally js/aplication.js. Oh noes!!

What's sad is it that nobody will notice (running ./manage.py collectstatic will exit with a 0). At least not unless you do some careful manual reviewing. Perhaps you will notice later, when you've pushed the site to prod, that the output file js/stats.js actually doesn't contain the code from js/application.js.

Or, you can automate it!

A Solution (aka. the hack)

I started this work this morning because the error actually happened to us. Thankfully not in production but our staging server produced a rendered HTML page with <link href="/static/css/report.min.cd784b4a5e2d.css" rel="stylesheet" type="text/css" /> which was an actual file but it was 0 bytes.

It wasn't that hard to figure out what the problem was because of the context of recent changes but it would have been nice to catch this during continuous integration.

So what we did was add an extra class to settings.STATICFILES_FINDERS called myproject.base.finders.LeftoverPipelineFinder. So now it looks like this:

# in settings.py

STATICFILES_FINDERS = (
    'django.contrib.staticfiles.finders.FileSystemFinder',
    'django.contrib.staticfiles.finders.AppDirectoriesFinder',
    'pipeline.finders.PipelineFinder',
    'myproject.finders.LeftoverPipelineFinder',  # the new hotness!
)

And here's the class implementation:

from pipeline.finders import PipelineFinder

from django.conf import settings
from django.core.exceptions import ImproperlyConfigured


class LeftoverPipelineFinder(PipelineFinder):
    """This finder is expected to come AFTER 
    django.contrib.staticfiles.finders.FileSystemFinder and 
    django.contrib.staticfiles.finders.AppDirectoriesFinder in 
    settings.STATICFILES_FINDERS.
    If a path is looked for here it means it's trying to find a file
    that none of the regular staticfiles finders couldn't find.
    """
    def find(self, path, all=False):
        # Before we raise an error, try to find out where,
        # in the bundles, this was defined. This will make it easier to correct
        # the mistake.
        for config_name in 'STYLESHEETS', 'JAVASCRIPT':
            config = settings.PIPELINE[config_name]
            for key in config:
                if path in config[key]['source_filenames']:
                    raise ImproperlyConfigured(
                        'Static file {!r} can not be found anywhere. Defined in '
                        "PIPELINE[{!r}][{!r}]['source_filenames']".format(
                            path,
                            config_name,
                            key,
                        )
                    )
        # If the file can't be found AND it's not in bundles, there's
        # got to be something else really wrong.
        raise NotImplementedError(path)

Now, if you have a typo or something in your bundles, you'll get a nice error about it as soon as you try to run collectstatic. For example:

▶ ./manage.py collectstatic --noinput
Post-processed 'css/search.min.css' as 'css/search.min.css'
Post-processed 'css/base.min.css' as 'css/base.min.css'
Post-processed 'css/base-dynamic.min.css' as 'css/base-dynamic.min.css'
Post-processed 'js/google-analytics.min.js' as 'js/google-analytics.min.js'
Traceback (most recent call last):
...
django.core.exceptions.ImproperlyConfigured: Static file 'js/aplication.js' can not be found anywhere. Defined in PIPELINE['JAVASCRIPT']['stats']['source_filenames']

Final Thoughts

This was a morning hack. I'm still not entirely sure if this the best approach, but there was none better and the result is pretty good.

We run ./manage.py collectstatic --noinput in our continous integration just before it runs ./manage.py test. So if you make a Pull Request that has a typo in bundles.py it will get caught.

Unfortunately, it won't find missing files if you use foo*.js or something like that. django-pipeline uses glob.glob to convert expressions like that into a list of actual files and that depends on the filesystem and all of that happens before the django.contrib.staticfiles.finders.find function is called.

If you have any better suggestions to solve this, please let me know.

July 25, 2017 12:12 AM


Daniel Bader

How to Install and Uninstall Python Packages Using Pip

How to Install and Uninstall Python Packages Using Pip

A step-by-step introduction to basic Python package management skills with the “pip” command. Learn how to install and remove third-party modules from PyPI.

How to Install and Uninstall Python Packages Using Pip

Python is approaching its third decade of good old age, and over the years many people have contributed to the creation of Python packages that perform specific functions and operations.

As of this writing, there are ~112K packages listed on the PyPI website. PyPI is short for “Python Package Index”, a central repository for free third-party Python modules.

This large and convenient module ecosystem is what makes Python so great to work with:

You see, most Python programmers are really assemblers of Python packages, which take care of a big chunk of the programming load required by modern applications.

Chances are that there is more than one Python package ready to be unleashed and help you with your specific programming needs.

For instance, while reading dbader.org, you may notice that the pages on the site render emoji quite nicely. You may wonder…

I’d like to use emoji on my Python app!

Is there a Python package for that?

Let’s find out!

Here’s what we’ll cover in this tutorial:

  1. Finding Python Packages
  2. What to Look for in a Python Package
  3. Installing Python Packages With Pip
  4. Capturing Installed Python Packages with Requirements Files
  5. Visualizing Installed Packages
  6. Installing Python Packages From a requirements.txt File
  7. Uninstalling Python Packages With Pip
  8. Summary & Conclusion

Finding Python Packages

Let’s use the emoji use case as an example. We find emoji related Python packages by visiting the PyPI website and searching for emoji via the search box on the top right corner of the page.

As of this writing, PyPI lists 94 packages, of which a partial list is shown below.

dbader.org - installing and uninstalling Python packages with pip - emoji packages

Notice the “Weight*” header of the middle column. That’s a key piece of information. The weight value is basically a search scoring number, which the site calculates for each package to rank them and list them accordingly.

If we read the footnote it tells us that the number is calculated by “the occurrence of search term weighted by field (name, summary, keywords, description, author, maintainer).”

Does that mean that the top one is the best package?

Not necessarily. Although uncommon, a package maintainer may stuff emoji into every field to try to top rank the package, which could well happen.

Conversely, many developers don’t do their homework and don’t bother filling out all the fields for their packages, which results in those packages being ranked lower.

You still need to research the packages listed, including a consideration for what your specific end use may be. For instance, a key question could be:

Which environment do you want to implement emoji on? A terminal-based app, or perhaps a Django web app?

If you are trying to display emoji on a django web app, you may be better off with the 10th package down the list shown above (package django-emoji 2.2.0).

For our use case, let’s assume that we are interested in emoji for a terminal based Python app.

Let’s check out the first one on our list (package emoji 0.4.5) by clicking on it.

What to Look for in a Python Package

The following are characteristics of a good Python package:

  1. Decent documentation: By reading it we can get a clue as to whether the package could meet our need or not;
  2. Maturity and stability: It’s been around for some time, proven by both its age and its successive versions;
  3. Number of contributors: Healthy packages (especially complex ones) tend to have a healthy number of maintainers;
  4. Maintenance: It undergoes maintenance on a regular basis (we live in an ever-evolving world).

Although I would check it out, I wouldn’t rely too much on the development status listed for each package, that is, whether it’s a 4 - Beta or 5 - Production/Stable package. That classification is in the eye of the package creator and not necessarily reliable.

On our emoji example, the documentation seems decent. At the top of the page, we get a graphical indication of the package at work (see snippet below), which shows emoji on a Python interpreter. Yay!

dbader.org - installing and uninstalling Python packages with pip - emoji on a python interpreter

The documentation for our emoji package also tells us about installing it, how to contribute to its development, etc., and points us to a GitHub page for the package, which is a great source of useful information about it.

By visiting its GitHub page, we can glean from it that the package has been around for at least two years, was last maintained in the past couple of months, has been starred 300+ times, has been forked 58 times, and has 10 contributors.

It’s looking good! We have identified a good candidate to incorporate emoji-ing into our Python terminal app.

How do we go about installing it?

Installing Python Packages With Pip

At this time, I am assuming that you already have Python installed on your system. There is plenty of info out there as to how to accomplish that.

Once you install Python, you can check whether pip is installed by running pip --version on a terminal.

I get the following output:

$ pip --version
pip 9.0.1 from /Library/Frameworks/Python.framework/↵
Versions/3.5/lib/python3.5/site-packages (python 3.5)

Since Python 3.4, pip is bundled with the Python installation package. If for some reason it is not installed, go ahead and get it installed.

I highly recommend also that you use a virtual environment (and more specifically, virtualenvwrapper), a set of extensions that…

…include wrappers for creating and deleting virtual environments and otherwise managing your development workflow, making it easier to work on more than one project at a time without introducing conflicts in their dependencies.

For this tutorial, I have created a virtual environment called pip-tutorial, which you will see going forward. My other tutorial walks you through setting up Python and virtualenvwrapper on Windows.

Below you’ll see how package dependencies can bring complexity into our already complex development environments, which is the reason why using virtual environments is a must for Python development.

A great place to start learning about a terminal program is by running it without any options on the terminal. So, on your terminal, run pip. You would get a list of Commands and General Options.

Below is a partial list of the results on my terminal:

dbader.org - installing and uninstalling Python packages with pip - pip commands and general options

From there on you could run pip install --help to read on what the install command does and what you need to specify to run it, for example. Of course, reading the pip documentation is another great place to start.

$ pip install --help

Usage:
  pip install [options] <requirement specifier> [package-index-options] ...
  pip install [options] -r <requirements file> [package-index-options] ...
  pip install [options] [-e] <vcs project url> ...
  pip install [options] [-e] <local project path> ...
  pip install [options] <archive url/path> ...

Description:
  Install packages from:

  - PyPI (and other indexes) using requirement specifiers.
  - VCS project urls.
  - Local project directories.
  - Local or remote source archives.

  pip also supports installing from "requirements files", which provide
  an easy way to specify a whole environment to be installed.

Install Options:
  ...

Let’s take a quick detour and focus on the freeze command next, which will be a key one in dealing with dependencies. Running pip freeze displays a list of all installed Python packages. If I run it with my freshly installed virtual environment active, I should get an empty list, which is the case:

$ pip freeze

Now we can get the Python interpreter going by typing python on our terminal. Once that’s done, let’s try to import the emoji module, upon which python will complain that there isn’t such a module installed, and rightfully so for we haven’t installed it, yet:

$ python
Python 3.5.0 (default)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import emoji
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'emoji'

To finally install the package, we can go ahead and run pip install emoji on our terminal. I get the following output:

$ pip install emoji==0.4.5
Collecting emoji==0.4.5
Installing collected packages: emoji
Successfully installed emoji-0.4.5

🚫 Getting a pip install “invalid syntax” error?

Please note that the pip install command needs to be run from the command-line inside your terminal program, and not inside the Python interpreter.

If you’re getting a “SyntaxError: invalid syntax” message from running pip install, then try leaving the interpreter with Ctrl+C and run the pip command again from the terminal prompt.

Pip is a program that installs modules, so you can use them from Python. Once you have installed the module, then you can open the Python shell and import the module.

When installing packages with pip, we can constrain pip to install a version of our preference, by using the following operators:

A specific version of the package (==):

$ pip install emoji==0.4.1

A version other than the specified one (!=):

$ pip install emoji!=0.4.1

A version equal to or greater than a specific version (>=):

$ pip install emoji>=0.4.0

A version of the package in the specified range (>=X.Y.T, <=X.Y.Z):

$ pip install emoji>=0.4.0, <=0.4.9

For a full specification of the version specifiers, refer to this page. Generally the most useful specifier here is == to pip install a specific version of a package. If we don’t constrain pip, it will grab the latest version of the package.

You may be wondering why you would want to install an older version of a Python package in the first place:

Programmers freeze requirements to keep track of the versions of the different packages that are installed on development and production environments. One of the objectives is to be able to replicate the environments as needed. Dan’s course on Python dependency management goes into more detail on that topic.

Let’s continue on and run pip freeze again after installing the emoji package. You should now see it included in the list of all installed Python modules:

$ pip freeze
emoji==0.4.5

As expected, pip freeze now lists the emoji package as an added dependency with a specific version number.

I now go back to my Python interpreter session, and run import emoji, and this time around Python doesn’t complain, which is a good sign. I test it, and get the following output:

dbader.org - installing and uninstalling Python packages with pip - emoji works

Success, at last! We just installed and then imported a third-party Python module. Great job 🙂

It’s typical for an application to have several interdependent packages. For instance, running pip freeze on the virtual environment that I use to develop tumblingprogrammer.com, will output the following list of modules:

appdirs==1.4.3
beautifulsoup4==4.6.0
Django==1.11.1
django-bootstrap3==8.2.3
django-crispy-forms==1.6.1
django-debug-toolbar==1.8
(...)
pyparsing==2.2.0
pytz==2017.2
PyYAML==3.12
selenium==3.4.1
six==1.10.0
sqlparse==0.2.3
tornado==4.5.1

That’s a total of 25 Python packages. And it’s a fairly simple application. Later on I’ll describe a way to visualize the interdependency between packages.

Capturing Installed Python Packages with Requirements Files

Developers get in the habit of freezing requirements every time that a package or a dependency gets installed on their projects. We do that by running the following pip command:

$ pip freeze > requirements.txt

This dumps the output of pip freeze into a requirements.txt file on the working directory.

Let’s assume now that for some reason we need to install MarkupSafe version 0.11. Let’s assume also that we have gotten ahead, installed it, tested it, and that our app behaves as we expect it to.

Let’s run pip freeze, which would only output our two packages, as shown below:

$ pip freeze
emoji==0.4.5
MarkupSafe==0.11

To continue with our learning, let’s go ahead and install Flask, a popular web microframework. We’ll grab the latest version of it by running pip install flask.

I get the following output (if you are following along, yours might differ a little bit, for my computer had cached the files from a previous install):

$ pip install flask
Collecting flask
  Using cached Flask-0.12.2-py2.py3-none-any.whl
Collecting itsdangerous>=0.21 (from flask)
Collecting Jinja2>=2.4 (from flask)
  Using cached Jinja2-2.9.6-py2.py3-none-any.whl
Collecting click>=2.0 (from flask)
  Using cached click-6.7-py2.py3-none-any.whl
Collecting Werkzeug>=0.7 (from flask)
  Using cached Werkzeug-0.12.2-py2.py3-none-any.whl
Collecting MarkupSafe>=0.23 (from Jinja2>=2.4->flask)
Installing collected packages: itsdangerous, MarkupSafe, Jinja2, click, Werkzeug, flask
  Found existing installation: MarkupSafe 0.11
    Uninstalling MarkupSafe-0.11:
      Successfully uninstalled MarkupSafe-0.11
Successfully installed Jinja2-2.9.6 MarkupSafe-1.0 Werkzeug-0.12.2 click-6.7 flask-0.12.2 itsdangerous-0.24

Flask, being a more complex package, has some dependencies (Werkzeug, itsdangerous, etc.) which are installed with it automatically through the pip install command.

I want to call your attention to the following lines, extracted from the above listing:

...
  Found existing installation: MarkupSafe 0.11
    Uninstalling MarkupSafe-0.11:
      Successfully uninstalled MarkupSafe-0.11
...

Take a close look…

You’ll see that pip doesn’t have a way of reconciling conflicting dependencies. Without even warning us, it went ahead and replaced version 0.11 with version 1.0 of our MarkupSafe package. And that could be trouble for our application.

At that point in time, we run our app tests (assuming that have them), and dig into our application to make sure that the changes between 0.11 and 1.0 of the MarkupSafe package don’t break it.

If I were to face this situation in real life, I would roll back the changes first by uninstalling Flask and its dependencies and restore the packages that I had before. Then I would upgrade MarkupSafe to 1.0, test to make sure that the application works as expected. And then—and only then—would I re-install Flask.

Assuming that we have gone through rolling back, upgrading, testing, and re-installing Flask, if we run pip freeze now, we get 7 packages in total:

$ pip freeze
click==6.7
emoji==0.4.5
Flask==0.12.2
itsdangerous==0.24
Jinja2==2.9.6
MarkupSafe==1.0
Werkzeug==0.12.2

Let’s go ahead and freeze our requirements into a requirements.txt file by running pip freeze > requirements.txt.

Now we’re going to add another package with dependencies to increase the complexity of our setup. We’ll install version 6.0 of a package called alembic by running:

$ pip install alembic==0.6
Collecting alembic==0.6
Collecting Mako (from alembic==0.6)
Collecting SQLAlchemy>=0.7.3 (from alembic==0.6)
Requirement already satisfied: MarkupSafe>=0.9.2 in /Users/puma/.virtualenvs/pip-tutorial/lib/python3.5/site-packages (from Mako->alembic==0.6)
Installing collected packages: Mako, SQLAlchemy, alembic
Successfully installed Mako-1.0.7 SQLAlchemy-1.1.11 alembic-0.6.0

I now call your attention to the following line from the above listing:

...
Requirement already satisfied: MarkupSafe>=0.9.2 in /Users/puma/.virtualenvs/pip-tutorial/lib/python3.5/site-packages (from Mako->alembic==0.6)
...

Which means that alembic also depends on MarkupSafe. More complexity, huh? Let’s run pip freeze:

$ pip freeze
alembic==0.6.0
click==6.7
emoji==0.4.5
Flask==0.12.2
itsdangerous==0.24
Jinja2==2.9.6
Mako==1.0.7
MarkupSafe==1.0
SQLAlchemy==1.1.11
Werkzeug==0.12.2

The listing above showing all the packages on our emoji application is not very helpful at the moment, for it doesn’t give us info on dependencies (it only lists packages in alphabetical order). Let’s fix that.

Visualizing Installed Packages

One good package to have installed on our environment is pipdeptree, which shows the dependency tree of packages. Let’s go ahead and install the latest version of it by running the following command:

$ pip install pipdeptree

Once it’s done, let’s run pip freeze to see what we get:

$ pip freeze
alembic==0.6.0
click==6.7
emoji==0.4.5
Flask==0.12.2
itsdangerous==0.24
Jinja2==2.9.6
Mako==1.0.7
MarkupSafe==1.0
pipdeptree==0.10.1
SQLAlchemy==1.1.11
Werkzeug==0.12.2

We now get 11 packages, as we have added pipdeptree, which had no dependencies. Let’s run pipdeptree on the terminal to see what it does. Below is the output that I get on my machine:

$ pipdeptree
alembic==0.6.0
  - Mako [required: Any, installed: 1.0.7]
    - MarkupSafe [required: >=0.9.2, installed: 1.0]
  - SQLAlchemy [required: >=0.7.3, installed: 1.1.11]
emoji==0.4.5
Flask==0.12.2
  - click [required: >=2.0, installed: 6.7]
  - itsdangerous [required: >=0.21, installed: 0.24]
  - Jinja2 [required: >=2.4, installed: 2.9.6]
    - MarkupSafe [required: >=0.23, installed: 1.0]
  - Werkzeug [required: >=0.7, installed: 0.12.2]
pipdeptree==0.10.1
  - pip [required: >=6.0.0, installed: 9.0.1]
setuptools==36.2.0
wheel==0.29.0

We notice much more useful information here, including dependencies, and the minimum versions required for dependent packages to work properly.

Notice, once again, how MarkupSafe is listed twice, as both Jinja2 (and Flask) and Mako (and alembic) depend on it. That’s very useful information to troubleshoot things gone ugly.

We also notice other packages here that pip freeze doesn’t list, including pip, setuptools and wheel. The reason is that by default pip freeze doesn’t list packages that pip itself depends on.

We can use the --all flag to show also those packages. Let’s test this by running pip freeze --all, in which case we get:

$ pip freeze --all
alembic==0.6.0
click==6.7
emoji==0.4.5
Flask==0.12.2
itsdangerous==0.24
Jinja2==2.9.6
Mako==1.0.7
MarkupSafe==1.0
pip==9.0.1
pipdeptree==0.10.1
setuptools==36.2.0
SQLAlchemy==1.1.11
Werkzeug==0.12.2
wheel==0.29.0

Another benefit of using pipdeptree is that it warns us about conflicting dependencies, including circular ones (where packages depend on one another), but I have yet to see that in action. So far I couldn’t replicate the functionality on my system. You can find more info about the tool on its PyPI page.

Installing Python Packages From a requirements.txt File

If you have a requirements.txt file, you can install all the packages listed there by running the following command:

$ pip install -r /path/to/the/file/requirements.txt

This is very handy when we want to replicate environments and have access to a requirements.txt that reflects the makeup of it.

Uninstalling Python Packages With Pip

In this section you’ll see how to uninstall individual Python packages from your system or active virtual environment, how you can remove multiple packages at once with a single command, and how you can remove all installed Python packages.

Uninstalling individual packages:

You can do so by running, as an example, pip uninstall alembic. Let’s do that on our setup to see what happens. Here is the output on my end:

$ pip uninstall alembic
Uninstalling alembic-0.6.0:
  /Users/puma/.virtualenvs/pip-tutorial/bin/alembic
  ... a bunch on other files ...
  /Users/puma/.virtualenvs/pip-tutorial/lib/python3.5/site-packages/alembic/util.py
Proceed (y/n)? y
  Successfully uninstalled alembic-0.6.0

Let’s run pipdeptree to see what our setup looks like:

$ pipdeptree
emoji==0.4.5
Flask==0.12.2
  - click [required: >=2.0, installed: 6.7]
  - itsdangerous [required: >=0.21, installed: 0.24]
  - Jinja2 [required: >=2.4, installed: 2.9.6]
    - MarkupSafe [required: >=0.23, installed: 1.0]
  - Werkzeug [required: >=0.7, installed: 0.12.2]
Mako==1.0.7
  - MarkupSafe [required: >=0.9.2, installed: 1.0]
pipdeptree==0.10.1
  - pip [required: >=6.0.0, installed: 9.0.1]
setuptools==36.2.0
SQLAlchemy==1.1.11
wheel==0.29.0

If you look carefully, you may notice that the alembic dependencies are still present, because pip uninstall does not get rid of them, by design.

We have to manually do that (there are other options, which we will cover below). Therefore, it is extremely important that we freeze our requirements and commit changes to our requirements.txt file every time that we install or uninstall packages so we know what our setup should look like if we need to roll back changes.

Uninstalling multiple Python packages at once:

You can also uninstall several packages at once, by using the following command-line syntax:

$ pip uninstall package1 package2 ...

Another option is reading the list of packages to uninstall from a requirements file. Similar to its install counterpart, if you have a requirements.txt file, you can uninstall all the packages listed there like so:

$ pip uninstall -r /path/to/the/file/requirements.txt

Note that we could wipe out all the packages on our setup, which could actually be quite useful. Let’s take a look at an example.

The output below is a list of my git commits log (gl is an alias on my bash profile for a prettified git log):

$ gl
* 40f4f37 - (HEAD -> master) all packages in (37 minutes ago) <Jose Pumar>
* 2d00cf5 - emoji + markupsafe + flask + alembic (56 minutes ago) <Jose Pumar>
* e52002b - emoji + MarkupSafe + Flask (84 minutes ago) <Jose Pumar>
* 9c48895 - emoji + MarkupSafe (86 minutes ago) <Jose Pumar>
* 3a797b3 - emoji + MarkSafe (2 hours ago) <Jose Pumar>
* ... other commits...

If I change my mind and decide that I don’t need alembic any more, I could delete all the packages by running pip uninstall -r requirements.txt while on commit 40f4f37 (the HEAD).

If I do it, it gives me a bunch of warnings and asks me if I want to proceed several times (once for each package), to which I say yes. I could have avoided that by using the flag -y, as in:

$ pip uninstall -y -r requirements.txt

The -y flag tells pip not to ask for confirmation of uninstall deletions. If we run pip freeze after this operation, we’ll get an empty packages list, which is what we want.

We then checkout commit e52002b (the last safe commit before we installed alembic), and run pip install -r requirements.txt to reinstate the packages that we had at that point in time.

Removing all installed Python packages:

Sometimes it can be useful to remove all installed packages in a virtual environment or on your system Python install. It can help you get back to a clean slate.

Running the following command will uninstall all Python packages in the currently active environment:

$ pip freeze | xargs pip uninstall -y

This command works by first listing all installed packages using the freeze command, and then feeding the list of packages into the pip uninstall command to remove them.

Adding the -y flag automatically confirms the uninstallation so you don’t have to stick around hammering the “y” key on your keyboard.

Installing and Uninstalling Python Packages with the “pip” Package Manager – Conclusion

Although we covered a lot of ground and shed light on key commands and major challenges that you may face when dealing with installing and uninstalling Python packages and their dependencies.

In summary, the workflow for the installation of a Python package with pip is as follows:

  1. Make sure that you are using a virtual environment.
  2. Identify the need for a new package.
  3. Research potential candidate packages: Look for the maturity of the package, its documentation, etc. See what you can find regarding its dependencies. For example, other packages that have to be installed so the package works properly. Sometimes the documentation refers to them.
  4. Install the package and its dependent packages: pip will do this for you. Look for version upgrades in the pip installation log.
  5. Test your application to make sure that the package meets your needs and that the package and or its dependent packages don’t break it.
  6. Freeze your requirements: Run pip freeze > requirements.txt if tests show your application is still okay and works as intended.
  7. Commit the changes to Git or the version control tool of your choice.
  8. Repeat.

There is a lot more to cover, especially when it comes to dependency management, which has long-term implications on how we setup and configure our Python projects.

Such a complexity is one of the factors that make necessary to implement different settings and configurations to account for the distinct needs of our development, staging, and production environments.

Happy pip-ing!

July 25, 2017 12:00 AM

July 24, 2017


Python Anywhere

Outage report: 20, 21 and 22 July 2017

We had several outages over the last few days. The problem appears to be fixed now, but investigations into the underlying cause are still underway. This post is a summary of what happened, and what we know so far. Once we've got a better understanding of the issue, we'll post more.

It's worth saying at the outset that while the problems related to the way we manage our users' files, those files themselves were always safe. While availability problems are clearly a big issue, we regard data integrity as more important.

20 July: the system update

On Thursday 20 July, at 05:00 UTC, we released a new system update for PythonAnywhere. This was largely an infrastructural update. In particular, we updated our file servers from Ubuntu 14.04 to 16.04, as part of a general upgrade of all servers in our cluster.

File servers are, of course, the servers that manage the files in your PythonAnywhere disk storage. Each server handles the data for a specific set of users, and serves the files up to the other servers in the cluster that need them -- the "execution" servers where your websites, your scheduled tasks, and your consoles run. The files themselves are stored on network-attached storage (and mirrored in realtime to redundant disks on a separate set of backup servers); the file servers simply act as NFS servers and manage a few simple things like disk quotas.

While the system update took a little longer than we'd planned, once everything was up and running, the system looked stable and all monitoring looked good.

20 July: initial problems

At 12:07 UTC our monitoring system showed a very brief issue. From some of our web servers, it appeared that access to one of our file servers, file-2, had very briefly slowed right down -- it was taking more than 30 seconds to list the specific directory that is monitored. The problem cleared up after about 60 seconds. Other file servers were completely unaffected. We did some investigations, but couldn't find anything, so we chalked it up as a glitch and kept an eye out for further problems.

At 14:12 UTC it happened again, and then over the course of the afternoon, the "glitches" became more frequent and started lasting longer. We discovered that the symptom from the file server's side was that all of the NFS daemons -- the processes that together make up an NFS server -- would all become busy; system load would rise from about 1.5 to 64 or so. They were all waiting uninterruptably on what we think was disk I/O (status "D" in top).

The problem only affected file-2 -- other file servers were all fine. Given that every file server had been upgraded to an identical system image, our initial suspicion was that there might be some kind of hardware problem. At 17:20 UTC we got in touch with AWS to discuss whether this was likely.

By 19:10 our discussions with AWS had revealed nothing of interest. The "glitches" had become a noticeable problem for users, and we decided that while there was no obvious sign of hardware problems, it would be good to at least eliminate that as a possible cause, so we took a snapshot of all disks containing user data (for safety), then migrated the server to new hardware, causing a 20-minute outage for users on that file server (who were already seeing a serious slowdown anyway), and a 5-minute outage for everyone else, the latter because we had to reboot the execution servers

After this move, at 19:57 UTC, everything seemed OK. Our monitoring was clear, and the users we were in touch with confirmed that everything was looking good.

21 July: the problem continues

At 14:31 UTC on 21 July, we saw another glitch on our monitoring. Again, the problem cleared up quickly, but we started looking again into what could possibly be the cause. There were further glitches at 15:17 and 16:51, but then the problem seemed to clear up.

Unfortunately at 22:44 it flared up again. Again, the issues started happening more frequently, and lasting longer each time, until they became very noticeable for our users at around 00:30 UTC. At 00:55 UTC we decided to move the server to different hardware again -- there's no official way to force a move to new hardware on AWS; stopping and starting an instance usually does it, but there's a small chance you'd end up on the same physical host again, so a second attempt seemed worth-while. If nothing else, it would hopefully at least clear things up for another 12 hours or so and buy us time to work out what was really going wrong.

This time, things didn't go according to plan. The file server failed to come up on the new hardware, and trying to move again did not help. We decided that we were going to need to provision a completely fresh file server, and move the disks across. While we have processes in place for replacing file servers as part of a normal system update, and for moving them to new hardware without changing (for example) their IP address, replacing one under these circumstances is not a procedure we've done before. Luckily, it went as well as could be expected under the circumstances. At 01:23 UTC we'd worked out what to do and started the new file server. By 01:50 we'd started the migration, and by 02:20 UTC everything was moved over. There were a few remaining glitches, but these were cleared up by 02:45 UTC.

23 July -- more problems -- and a resolution?

We did not expect the fix we'd put in to be a permanent solution -- though we did have a faint hope that perhaps the problem had been caused by some configuration issue on file-2, which might have been remediated by our having provisioned a new server rather than simply moving the old one. This was never a particularly strong hope, however, and when the problems started again at 12:16 UTC we weren't entirely surprised.

We had generated two new hypotheses about the possible cause of these issues by now:

The problem with both of these hypotheses was that only one of our file servers was affected. All file servers had the same number of workers, and all had been upgraded to 16.04.

Still, it was worth a try, we thought. We decided to try changing the number of daemon processes first, as it we believed it would cause minimal downtime; however, we started up a new file server on 14.04 so that it would be ready just in case.

At 14:41 UTC we reduced the number of workers down to eight. We were happy to see that this was picked up across the cluster without any need to reboot anything, so there was no downtime.

Unfortunately, at 15:04, we saw another problem. We decided to spend more time investigating a few ideas that had occurred to us before taking the system down again. At 19:00 we tried increasing the number of NFS processes to 128, but that didn't help. At 19:23 we decided to go ahead with switching over to the 14.04 server we'd prepared earlier. We kicked off some backup snapshots of the user data, just in case there were problems, and at 19:38 we started the migration over.

This completed at 19:46, but required a restart of all of the execution servers in order to pick up the new file server. We started this process immediately, and web servers came back online at 19:48, consoles at 19:50, and scheduled tasks at 19:55.

By 20:00 we were satisfied that everything looked good, and so we went back to monitoring.

Where we are now

Since the update on Saturday, there were no monitoring glitches at all on Sunday, but we did see one potential problem on Monday at 12:03. However this blip was only noticed from one of our web servers (previous issues affected at least three at a time, and sometimes as many as nine), and the problem has not been followed by any subsequent outages in the 4 hours since, which is somewhat reassuring.

We're continuing to monitor closely, and are brainstorming hypotheses to explain what might have happened (or, perhaps still be happening). Of particular interest is the fact that this issue only affected one of our file servers, despite all of them having been upgraded. One possibility we're considering is that the correlation in timing with the upgrade is simply a red herring -- that instead there's some kind of access pattern, some particular pattern of reads/writes to the storage, which only started at around midday on Thursday after the system update. We're planning possible ways to investigate that should the problem occur again.

Either way, whether the problem is solved now or not, we clearly have much more investigation to do. We'll post again when we have more information.

July 24, 2017 04:44 PM


Will Kahn-Greene

Soloists: code review on a solo project

Summary

I work on some projects with other people, but I also spend a lot of time working on projects by myself. When I'm working by myself, I have difficulties with the following:

  1. code review
  2. bouncing ideas off of people
  3. peer programming
  4. long slogs
  5. getting help when I'm stuck
  6. publicizing my work
  7. dealing with loneliness
  8. going on vacation

I started a #soloists group at Mozilla figuring there are a bunch of other Mozillians who are working on solo projects and maybe if we all work alone together, then that might alleviate some of the problems of working solo. We hang out in the #soloists IRC channel on irc.mozilla.org. If you're solo, join us!

I keep thinking about writing a set of blog posts for things we've talked about in the channel and how I do things. Maybe they'll help you.

This one covers code review.

Read more… (10 mins to read)

July 24, 2017 04:00 PM


Doug Hellmann

hmac — Cryptographic Message Signing and Verification — PyMOTW 3

The HMAC algorithm can be used to verify the integrity of information passed between applications or stored in a potentially vulnerable location. The basic idea is to generate a cryptographic hash of the actual data combined with a shared secret key. The resulting hash can then be used to check the transmitted or stored message … Continue reading hmac — Cryptographic Message Signing and Verification — PyMOTW 3

July 24, 2017 01:00 PM


Python Software Foundation

2017 Bylaw Changes

The PSF has changed its bylaws, following a discussion and vote among the voting members. I'd like to publicly explain those changes.

For each of the changes, I will describe  1.) what the bylaws used to say prior to June 2017 2.) what the new bylaws say and 3.) why the changes were implemented.

Certification of Voting Members
Every member had to acknowledge that they wanted to vote/or not vote every year.
The bylaws now say that the list of voters is based on criteria decided upon by the board.
The previous bylaws pertaining to this topic created too much work for our staff to handle and sometimes it was not done because we did not have the time resources to do it. We can now change the certification to something more manageable for our staff and our members.

Voting in New PSF Fellow Members
We did not have a procedure in place for this in the previous bylaws.
Now the bylaws allow any member to nominate a Fellow. Additionally, it gives the chance for the PSF Board to create a work group for evaluating the nominations.
We lacked a procedure. We had several inquiries and nominations in the past, but did not have a policy to respond with. Now that we voted in this bylaw, the PSF Board voted in the creation of the Work Group. We can now begin accepting new Fellow Members after several years.
Staggered Board Terms
We did not have staggered board terms prior to June 2017. Every director would be voted on every term.
The bylaws now say that in the June election, the top 4 voted directors would hold 3 year terms, the next 4 voted-in directors hold 2 year terms and the next 3 voted-in directors hold 1 year terms. That resulted in:
  1. Naomi Ceder (3 yr)
  2. Eric Holscher (3 yr)
  3. Jackie Kazil (3 yr)
  4. Paul Hildebrandt (3 yr)
  5. Lorena Mesa (2 yr)
  6. Thomas Wouters (2 yr)
  7. Kushal Das (2 yr)
  8. Marlene Mhangami (2 yr)
  9. Kenneth Reitz (1 yr)
  10. Trey Hunner (1 yr)
  11. Paola Katherine Pacheco (1 yr)
The main push behind this change is continuity. As the PSF continues to grow, we are hoping to make it more stable and sustainable. Having some directors in place for more than one year will help us better complete short-term and long-term projects. It will also help us pass on context from previous discussions and meetings.
Direct Officers
We did not have Direct Officers prior to June 2017.
The bylaws state that the current General Counsel and Director of Operations will be the Direct Officers of the PSF. Additionally, they state that the Direct Officers become the 12th and 13th members of the board giving them rights to vote on board business. Direct Officers can be removed by a.) fail of an approval vote, held on at least the same schedule as 3-year-term directors; b) leave the office associated with the officer director position; or c) fail a no-confidence vote.
In an effort to become a more stable and mature board, we are appointing two important positions to be directors of the board. Having the General Counsel and Director of Operations on the board helps us have more strength with legal situations and how the PSF operates. The two new Direct Officers are:
  1. Van Lindberg
  2. Ewa Jodlowska
Delegating Ability to Set Compensation
The bylaws used to state that the President of the Foundation would direct how compensation of the Foundation’s employees was decided.
The bylaws have changed so that the Board of Directors decide how employee compensation is decided.
This change was made because even though we keep the president informed of major changes, Guido does not participate in day to day operations nor employee management. We wanted the bylaws to clarify the most effective and fair way we set compensation for our staff.

We hope this breakdown sheds light on the changes and why they were important to implement. Please feel free to contact me with any questions or concerns.

July 24, 2017 11:35 AM


A. Jesse Jiryu Davis

Vote For Your Favorite PyGotham Talks

Black and white photograph of voters in 1930s-era British dress, standing lined up on one side of a wooden table, consulting with poll workers seated on the other side of the table and checking voter rolls.

We received 195 proposals for talks at PyGotham this year. Now we have to find the best 50 or so. For the first time, we’re asking the community to vote on their favorite talks. Voting will close August 7th; then I and my comrades on the Program Committee will make a final selection.

Your Mission, If You Choose To Accept It

We need your help judging which proposals are the highest quality and the best fit for our community’s interests. For each talk we’ll ask you one question: “Would you like to see this talk at PyGotham?” Remember, PyGotham isn’t just about Python: it’s an eclectic conference about open source technology, policy, and culture.

You can give each talk one of:

You can sign up for an account and begin voting at vote.pygotham.org. The site presents you with talks in random order, omitting the ones you have already voted on. For each talk, you will see this form:

image of +1/0/-1 voting form

Click “Save Vote” to make sure your vote is recorded. Once you do, a button appears to jump to the next proposal.

Our thanks to Ned Jackson Lovely, who made this possible by sharing the talk voting app “progcom” that was developed for the PyCon US committee.

So far, about 50 people have cast votes. We need to hear from you, too. Please help us shape this October’s PyGotham. Vote today!


Image: Voting in Brisbane, 1937

July 24, 2017 10:57 AM


Catalin George Festila

Fix Gimp with python script.

Today I will show you how python language can help GIMP users.
From my point of view, Gimp does not properly import frames from GIF files.
This program imports GIF files in this way:

Using the python module, you can get the correct frames from the GIF file.
Here's my script that uses the python PIL module.

import sys
from PIL import Image, ImageSequence
try:
img = Image.open(sys.argv[1])
except IOError:
print "Cant load", infile
sys.exit(1)

pal = img.getpalette()
prev = img.convert('RGBA')
prev_dispose = True
for i, frame in enumerate(ImageSequence.Iterator(img)):
dispose = frame.dispose

if frame.tile:
x0, y0, x1, y1 = frame.tile[0][1]
if not frame.palette.dirty:
frame.putpalette(pal)
frame = frame.crop((x0, y0, x1, y1))
bbox = (x0, y0, x1, y1)
else:
bbox = None

if dispose is None:
prev.paste(frame, bbox, frame.convert('RGBA'))
prev.save('result_%03d.png' % i)
prev_dispose = False
else:
if prev_dispose:
prev = Image.new('RGBA', img.size, (0, 0, 0, 0))
out = prev.copy()
out.paste(frame, bbox, frame.convert('RGBA'))
out.save('result_%03d.png' % i)
Name the python script with convert_gif.py and then you can use it on the GIF file as follows:
C:\Python27>python.exe convert_gif.py 0001.gif
The final result has a smaller number of images than in Gimp, but this was to be expected.

July 24, 2017 10:24 AM

July 23, 2017


Kevin Dahlhausen

Using Beets from 3rd Party Python Applications

I am thinking of using Beets as music library to update a project. The only example of using it this way is in the source code of the Beets command-line interface. That code is well-written but does much more than I need so I decided to create a simple example of using Beets in a 3rd party application.

The hardest part turned out to be determining how to create a proper configuration pro grammatically. The final code is short:


        config["import"]["autotag"] = False
        config["import"]["copy"] = False
        config["import"]["move"] = False
        config["import"]["write"] = False
        config["library"] = music_library_file_name
        config["threaded"] = True 

This will create a configuration that keeps the music files in place and does not attempt to autotag them.

Importating files requires one to subclass importer.ImportSession. A simple importer that serves to import files and not change them is:


    class AutoImportSession(importer.ImportSession):
        "a minimal session class for importing that does not change files"

        def should_resume(self, path):
            return True

        def choose_match(self, task):
            return importer.action.ASIS

        def resolve_duplicate(self, task, found_duplicates):
            pass

        def choose_item(self, task):
            return importer.action.ASIS 

That’s the trickiest part of it. The full demo is:


# Copyright 2017, Kevin Dahlhausen
#
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to
# the following conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.

from beets import config
from beets import importer
from beets.ui import _open_library

class Beets(object):
    """a minimal wrapper for using beets in a 3rd party application
       as a music library."""

    class AutoImportSession(importer.ImportSession):
        "a minimal session class for importing that does not change files"

        def should_resume(self, path):
            return True

        def choose_match(self, task):
            return importer.action.ASIS

        def resolve_duplicate(self, task, found_duplicates):
            pass

        def choose_item(self, task):
            return importer.action.ASIS

    def __init__(self, music_library_file_name):
        """ music_library_file_name = full path and name of
            music database to use """
        "configure to keep music in place and do not auto-tag"
        config["import"]["autotag"] = False
        config["import"]["copy"] = False
        config["import"]["move"] = False
        config["import"]["write"] = False
        config["library"] = music_library_file_name
        config["threaded"] = True

        # create/open the the beets library
        self.lib = _open_library(config)

    def import_files(self, list_of_paths):
        """import/reimport music from the list of paths.
            Note: This may need some kind of mutex as I
                  do not know the ramifications of calling
                  it a second time if there are background
                  import threads still running.
        """
        query = None
        loghandler = None  # or log.handlers[0]
        self.session = Beets.AutoImportSession(self.lib, loghandler,
                                               list_of_paths, query)
        self.session.run()

    def query(self, query=None):
        """return list of items from the music DB that match the given query"""
        return self.lib.items(query)

if __name__ == "__main__":

    import os

    # this demo places music.db in same lib as this file and
    # imports music from <this dir>/Music
    path_of_this_file = os.path.dirname(__file__)
    MUSIC_DIR = os.path.join(path_of_this_file, "Music")
    LIBRARY_FILE_NAME = os.path.join(path_of_this_file, "music.db")

    def print_items(items, description):
        print("Results when querying for "+description)
        for item in items:
            print("   Title: {} by '{}' ".format(item.title, item.artist))
            print("      genre: {}".format(item.genre))
            print("      length: {}".format(item.length))
            print("      path: {}".format(item.path))
        print("")

    demo = Beets(LIBRARY_FILE_NAME)

    # import music - this demo does not move, copy or tag the files
    demo.import_files([MUSIC_DIR, ])

    # sample queries:
    items = demo.query()
    print_items(items, "all items")

    items = demo.query(["artist:heart,", "title:Hold", ])
    print_items(items, 'artist="heart" or title contains "Hold"')

    items = demo.query(["genre:Hard Rock"])
    print_items(items, 'genre = Hard Rock') 

I hope this helps. Turns out it is easy to use beets in other apps.

July 23, 2017 08:47 PM


Mike Driscoll

Python is #1 in 2017 According to IEEE Spectrum

It’s always fun to see what languages are considered to be in the top ten. This year, IEEE Spectrum named Python as the #1 language in the Web and Enterprise categories. Some of the Python community over at Reddit think that the scoring of the languages are flawed because Javascript is below R in web programming. That gives me pause as well. Frankly I don’t really see how anything is above Javascript when it comes to web programming.

Regardless, it’s still interesting to read through the article.

Related Articles

July 23, 2017 06:53 PM


NumFOCUS

Meet our GSoC Students Part 3: Matplotlib, PyMC3, FEniCS, MDAnalysis, Data Retriever, & Gensim

July 23, 2017 05:00 PM


Trey Hunner

Craft Your Python Like Poetry

Line length is a big deal… programmers argue about it quite a bit. PEP 8, the Python style guide, recommends a 79 character maximum line length but concedes that a line length up to 100 characters is acceptable for teams that agree to use a specific longer line length.

So 79 characters is recommended… but isn’t line length completely obsolete? After all, programmers are no longer restricted by punch cards, teletypes, and 80 column terminals. The laptop screen I’m typing this on can fit about 200 characters per line.

Line length is not obsolete

Line length is not a technical limitation: it’s a human-imposed limitation. Many programmers prefer short lines because long lines are hard to read. This is true in typography and it’s true in programming as well.

Short lines are easier to read.

In the typography world, a line length of 55 characters per line is recommended for electronic text (see line length on Wikipedia). That doesn’t mean we should use a 55 character limit though; typography and programming are different.

Python isn’t prose

Python code isn’t structured like prose. English prose is structured in flowing sentences: each line wraps into the next line. In Python, statements are somewhat like sentences, meaning each sentence begins at the start of each line.

Python code is more like poetry than prose. Poets and Python programmers don’t wrap lines once they hit an arbitrary length; they wrap lines when they make sense for readability and beauty.

I stand amid the roar Of a surf-tormented shore, And I hold within my hand
Grains of the golden sand— How few! yet how they creep Through my fingers to
the deep, While I weep—while I weep! O God! can I not grasp Them with a
tighter clasp? O God! can I not save One from the pitiless wave? Is all that we
see or seem But a dream within a dream?

Don’t wrap lines arbitrarily. Craft each line with care to help readers experience your code exactly the way you intended.

I stand amid the roar
Of a surf-tormented shore,
And I hold within my hand
Grains of the golden sand—
How few! yet how they creep
Through my fingers to the deep,
While I weep—while I weep!
O God! can I not grasp
Them with a tighter clasp?
O God! can I not save
One from the pitiless wave?
Is all that we see or seem
But a dream within a dream?

Examples

It’s not possible to make a single rule for when and how to wrap lines of code. PEP8 discusses line wrapping briefly, but it only discusses one case of line wrapping and three different acceptable styles are provided, leaving the reader to choose which is best.

Line wrapping is best discussed through examples. Let’s look at a few examples of long lines and few variations for line wrapping for each.

Example: Wrapping a Comprehension

This line of code is over 79 characters long:

1
employee_hours = [schedule.earliest_hour for employee in self.public_employees for schedule in employee.schedules]

Here we’ve wrapped that line of code so that it’s two shorter lines of code:

1
2
employee_hours = [schedule.earliest_hour for employee in
                  self.public_employees for schedule in employee.schedules]

We’re able to insert that line break in this line because we have an unclosed square bracket. This is called an implicit line continuation. Python knows we’re continuing a line of code whenever there’s a line break inside unclosed square brackets, curly braces, or parentheses.

This code still isn’t very easy to read because the line break was inserted arbitrarily. We simply wrapped this line just before a specific line length. We were thinking about line length here, but we completely neglected to think about readability.

This code is the same as above, but we’ve inserted line breaks in very particular places:

1
2
3
employee_hours = [schedule.earliest_hour
                  for employee in self.public_employees
                  for schedule in employee.schedules]

We have two lines breaks here and we’ve purposely inserted them before our for clauses in this list comprehension.

Statements have logical components that make up a whole, the same way sentences have clauses that make up the whole. We’ve chosen to break up this list comprehension by inserting line breaks between these logical components.

Here’s another way to break up this statement:

1
2
3
4
5
employee_hours = [
    schedule.earliest_hour
    for employee in self.public_employees
    for schedule in employee.schedules
]

Which of these methods you prefer is up to you. It’s important to make sure you break up the logical components though. And whichever method you choose, be consistent!

Example: Function Calls

This is a Django model field with a whole bunch of arguments being passed to it:

1
2
3
default_appointment = models.ForeignKey(othermodel='AppointmentType',
                                        null=True, on_delete=models.SET_NULL,
                                        related_name='+')

We’re already using an implicit line continuation to wrap these lines of code, but again we’re wrapping this code at an arbitrary line length.

Here’s the same Django model field with one argument specific per line:

1
2
3
4
default_appointment = models.ForeignKey(othermodel='AppointmentType',
                                        null=True,
                                        on_delete=models.SET_NULL,
                                        related_name='+')

We’re breaking up the component parts (the arguments) of this statement onto separate lines.

We could also wrap this line by indenting each argument instead of aligning them:

1
2
3
4
5
6
default_appointment = models.ForeignKey(
    othermodel='AppointmentType',
    null=True,
    on_delete=models.SET_NULL,
    related_name='+'
)

Notice we’re also leaving that closing parenthesis on its own line. We could additionally add a trailing comma if we wanted:

1
2
3
4
5
6
default_appointment = models.ForeignKey(
    othermodel='AppointmentType',
    null=True,
    on_delete=models.SET_NULL,
    related_name='+',
)

Which of these is the best way to wrap this line?

Personally for this line I prefer that last approach: each argument on its own line, the closing parenthesis on its own line, and a comma after each argument.

It’s important to decide what you prefer, reflect on why you prefer it, and always maintain consistency within each project/file you create. And keep in mind that consistence of your personal style is less important than consistency within a single project.

Example: Chained Function Calls

Here’s a long line of chained Django queryset methods:

1
books = Book.objects.filter(author__in=favorite_authors).select_related('author', 'publisher').order_by('title')

Notice that there aren’t parenthesis around this whole statement, so the only place we can currently wrap our lines is inside those parenthesis. We could do something like this:

1
2
3
4
5
books = Book.objects.filter(
    author__in=favorite_authors
).select_related(
    'author', 'publisher'
).order_by('title')

But that looks kind of weird and it doesn’t really improve readability.

We could add backslashes at the end of each line to allow us to wrap at arbitrary places:

1
2
3
4
books = Book.objects\
    .filter(author__in=favorite_authors)\
    .select_related('author', 'publisher')\
    .order_by('title')

This works, but PEP8 recommends against this.

We could wrap the whole statement in parenthesis, allowing us to use implicit line continuation wherever we’d like:

1
2
3
4
books = (Book.objects
    .filter(author__in=favorite_authors)
    .select_related('author', 'publisher')
    .order_by('title'))

It’s not uncommon to see extra parenthesis added in Python code to allow implicit line continuations.

That indentation style is a little odd though. We could align our code with the parenthesis instead:

1
2
3
4
books = (Book.objects
         .filter(author__in=favorite_authors)
         .select_related('author', 'publisher')
         .order_by('title'))

Although I’d probably prefer to align the dots in this case:

1
2
3
4
books = (Book.objects
             .filter(author__in=favorite_authors)
             .select_related('author', 'publisher')
             .order_by('title'))

A fully indentation-based style works too (we’ve also moved objects to its own line here):

1
2
3
4
5
6
7
books = (
    Book
    .objects
    .filter(author__in=favorite_authors)
    .select_related('author', 'publisher')
    .order_by('title')
)

There are yet more ways to resolve this problem. For example we could try to use intermediary variables to avoid line wrapping entirely.

Chained methods pose a different problem for line wrapping than single method calls and require a different solution. Focus on readability when picking a preferred solution and be consistent with the solution you pick. Consistency lies at the heart of readability.

Example: Dictionary Literals

I often define long dictionaries and lists defined in Python code.

Here’s a dictionary definition that has been over multiple lines, with line breaks inserted as a maximum line length is approached:

1
2
3
MONTHS = {'January': 1, 'February': 2, 'March': 3, 'April': 4, 'May': 5,
          'June': 6, 'July': 7, 'August': 8, 'September': 9, 'October': 10,
          'November': 11, 'December': 12}

Here’s the same dictionary with each key-value pair on its own line, aligned with the first key-value pair:

1
2
3
4
5
6
7
8
9
10
11
12
MONTHS = {'January': 1,
          'February': 2,
          'March': 3,
          'April': 4,
          'May': 5,
          'June': 6,
          'July': 7,
          'August': 8,
          'September': 9,
          'October': 10,
          'November': 11,
          'December': 12}

And the same dictionary again, with each key-value pair indented instead of aligned (with a trailing comma on the last line as well):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
MONTHS = {
    'January': 1,
    'February': 2,
    'March': 3,
    'April': 4,
    'May': 5,
    'June': 6,
    'July': 7,
    'August': 8,
    'September': 9,
    'October': 10,
    'November': 11,
    'December': 12,
}

This is the strategy I prefer for wrapping long dictionaries and lists. I very often wrap short dictionaries and lists this way as well, for the sake of readability.

Python is Poetry

The moment of peak readability is the moment just after you write a line of code. Your code will be far less readable to you one day, one week, and one month after you’ve written it.

When crafting Python code, use spaces and line breaks to split up the logical components of each statement. Don’t write a statement on a single line unless it’s already very clear. If you break each line over multiple lines for clarity, lines length shouldn’t be a major concern because your lines of code will mostly be far shorter than 79 characters already.

Make sure to craft your code carefully as you write it because your future self will have a much more difficult time cleaning it up than you will right now. So take that line of code you just wrote and carefully add line breaks to it.

July 23, 2017 05:00 PM


Kay Hayen

Nuitka Release 0.5.27

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release comes a lot of bug fixes and improvements.

Bug Fixes

  • Fix, need to add recursed modules immediately to the working set, or else they might first be processed in second pass, where global names that are locally assigned, are optimized to the built-in names although that should not happen. Fixed in 0.5.26.1 already.
  • Fix, the accelerated call of methods could crash for some special types. This had been a regress of 0.5.25, but only happens with custom extension types. Fixed in 0.5.26.1 already.
  • Python3.5: For async def functions parameter variables could fail to properly work with in-place assignments to them. Fixed in 0.5.26.4 already.
  • Compatability: Decorators that overload type checks didn't pass the checks for compiled types. Now isinstance and as a result inspect module work fine for them.
  • Compatiblity: Fix, imports from __init__ were crashing the compiler. You are not supposed to do them, because they duplicate the package code, but they work.
  • Compatiblity: Fix, the super built-in on module level was crashing the compiler.
  • Standalone: For Linux, BSD and MacOS extension modules and shared libraries using their own $ORIGIN to find loaded DLLs resulted in those not being included in the distribution.
  • Standalone: Added more missing implicit dependencies.
  • Standalone: Fix, implicit imports now also can be optional, as e.g. _tkinter if not installed. Only include those if available.
  • The --recompile-c-only was only working with C compiler as a backend, but not in the C++ compatibility fallback, where files get renamed. This prevented that edit and test debug approach with at least MSVC.
  • Plugins: The PyLint plug-in didn't consider the symbolic name import-error but only the code F0401.
  • Implicit exception raises in conditional expressions would crash the compiler.

New Features

  • Added support for Visual Studio 2017. Issue#368.
  • Added option --python2-for-scons to specify the Python2 execute to use for calling Scons. This should allow using AnaConda Python for that task.

Optimization

  • References to known unassigned variables are now statically optimized to exception raises and warned about if the according option is enabled.
  • Unhashable keys in dictionaries are now statically optimized to exception raises and warned about if the according option is enabled.
  • Enable forward propagation for classes too, resulting in some classes to create only static dictionaries. Currently this never happens for Python3, but it will, once we can statically optimize __prepare__ too.
  • Enable inlining of class dictionary creations if they are mere return statements of the created dictionary. Currently this never happens for Python3, see above for why.
  • Python2: Selecting the metaclass is now visible in the tree and can be statically optimized.
  • For executables, we now also use a freelist for traceback objects, which also makes exception cases slightly faster.
  • Generator expressions no longer require the use of a function call with a .0 argument value to carry the iterator value, instead their creation is directly inlined.
  • Remove "pass through" frames for Python2 list contractions, they are no longer needed. Minimal gain for generated code, but more lightweight at compile time.
  • When compiling Windows x64 with MinGW64 a link library needs to be created for linking against the Python DLL. This one is now cached and re-used if already done.
  • Use common code for NameError and UnboundLocalError exception code raises. In some cases it was creating the full string at compile time, in others at run time. Since the later is more efficient in terms of code size, we now use that everywhere, saving a bit of binary size.
  • Make sure to release unused functions from a module. This saves memory and can be decided after a full pass.
  • Avoid using OrderedDict in a couple of places, where they are not needed, but can be replaced with a later sorting, e.g. temporary variables by name, to achieve deterministic output. This saves memory at compile time.
  • Add specialized return nodes for the most frequent constant values, which are None, True, and False. Also a general one, for constant value return, which avoids the constant references. This saves quite a bit of memory and makes traversal of the tree a lot faster, due to not having any child nodes for the new forms of return statements.
  • Previously the empty dictionary constant reference was specialized to save memory. Now we also specialize empty set, list, and tuple constants to the same end. Also the hack to make is not say that {} is {} was made more general, mutable constant references and now known to never alias.
  • The source references can be marked internal, which means that they should never be visible to the user, but that was tracked as a flag to each of the many source references attached to each node in the tree. Making a special class for internal references avoids storing this in the object, but instead it's now a class property.
  • The nodes for named variable reference, assignment, and deletion got split into separate nodes, one to be used before the actual variable can be determined during tree building, and one for use later on. This makes their API clearer and saves a tiny bit of memory at compile time.
  • Also eliminated target variable references, which were pseudo children of assignments and deletion nodes for variable names, that didn't really do much, but consume processing time and memory.
  • Added optimization for calls to staticmethod and classmethod built-in methods along with type shapes.
  • Added optimization for open built-in on Python3, also adding the type shape file for the result.
  • Added optimization for bytearray built-in and constant values. These mutable constants can now be compile time computed as well.
  • Added optimization for frozenset built-in and constant values. These mutable constants can now be compile time computed as well.
  • Added optimization for divmod built-in.
  • Treat all built-in constant types, e.g. type itself as a constant. So far we did this only for constant values types, but of course this applies to all types, giving slightly more compact code for their uses.
  • Detect static raises if iterating over non-iterables and warn about them if the option is enabled.
  • Split of locals node into different types, one which needs the updated value, and one which just makes a copy. Properly track if a functions needs an updated locals dict, and if it doesn't, don't use that. This gives more efficient code for Python2 classes, and exec using functions in Python2.
  • Build all constant values without use of the pickle module which has a lot more overhead than marshal, instead use that for too large long values, non-UTF8 unicode values, nan float, etc.
  • Detect the linker arch for all Linux platforms using objdump instead of only a hand few hard coded ones.

Cleanups

  • The use of INCREASE_REFCOUNT got fully eliminated.
  • Use functions not vulenerable for buffer overflow. This is generally good and avoids warnings given on OpenBSD during linking.
  • Variable closure for classes is different from all functions, don't handle the difference in the base class, but for class nodes only.
  • Make sure mayBeNon doesn't return None which means normally "unclear", but False instead, since it's always clear for those cases.
  • Comparison nodes were using the general comparison node as a base class, but now a proper base class was added instead, allowing for cleaner code.
  • Valgrind test runners got changed to using proper tool namespace for their code and share it.
  • Made construct case generation code common testing code for re-use in the speedcenter web site. The code also has minor beauty bugs which will then become fixable.
  • Use appdirs package to determine place to store the downloaded copy of depends.exe.
  • The code still mentioned C++ in a lot of places, in comments or identifiers, which might be confusing readers of the code.
  • Code objects now carry all information necessary for their creation, and no longer need to access their parent to determine flag values. That parent is subject to change in the future.
  • Our import sorting wrapper automatically detects imports that could be local and makes them so, removing a few existing ones and preventing further ones on the future.
  • Cleanups and annotations to become Python3 PyLint clean as well. This found e.g. that source code references only had __cmp__ and need rich comparison to be fully portable.

Tests

  • The test runner for construct tests got cleaned up and the constructs now avoid using xrange so as to not need conversion for Python3 execution as much.
  • The main test runner got cleaned up and uses common code making it more versatile and robust.
  • Do not run test in debugger if CPython also segfaulted executing the test, then it's not a Nuitka issue, so we can ignore that.
  • Improve the way the Python to test with is found in the main test runner, prefer the running interpreter, then PATH and registry on Windows, this will find the interesting version more often.
  • Added support for "Landscape.io" to ignore the inline copies of code, they are not under our control.
  • The test runner for Valgrind got merged with the usage for constructs and uses common code now.
  • Construct generation is now common code, intended for sharing it with the Speedcenter web site generation.
  • Rebased Python 3.6 test suite to 3.6.1 as that is the Python generally used now.

Organizational

  • Added inline copy of appdirs package from PyPI.
  • Added credits for RedBaron and isort.
  • The --experimental flag is now creating a list of indications and more than one can be used that way.
  • The PyLint runner can also work with Python3 pylint.
  • The Nuitka Speedcenter got more fine tuning and produces more tags to more easily identify trends in results. This needs to become more visible though.
  • The MSI files are also built on AppVeyor, where their building will not depend on me booting Windows. Getting these artifacts as downloads will be the next step.

Summary

This release improves many areas. The variable closure taking is now fully transparent due to different node types, the memory usage dropped again, a few obvious missing static optimizations were added, and many built-ins were completed.

This release again improves the scalability of Nuitka, which again uses less memory than before, although not an as big jump as before.

This does not extend or use special C code generation for bool or any type yet, which still needs design decisions to proceed and will come in a later release.

July 23, 2017 03:42 PM


Patricio Paez

Concatenating strings with punctuation

Creating strings of the form “a, b, c and d” from a list [‘a’, ‘b’, ‘c’, ‘d’] is a task I faced some time ago, as I needed to include such strings in some HTML documents. The “,” and the “and” are included according to the amount of elements. [‘a’, ‘b’] yields “a and b“, [‘a’] yields “a” for example. In a recent review to the code, I changed the method from using string concatenation:

if len(items) > 1:
    text = items[0]
    for item in items[1:-1]:
        text += ', ' + item
    text += ' and ' + items[-1]
else:
    text = items[0]

to the use of slicing of the items list, addition of the resulting sublists and str.join to include the punctuation:

first = items[:1]
middle = items[1:-1]
last = items[1:][-1:]
first_middle = [', '.join(first + middle)]
text = ' and '.join(first_middle + last)

The old method requires an additonal elif branch to work when items is an empty list; the new method returns an empty string if the items list is empty. I share this tip in case it is useful to someone else.

July 23, 2017 11:45 AM


Full Stack Python

How to Add Hosted Monitoring to Flask Web Applications

How do you know whether your application is running properly with minimal errors after building and deploying it? The fastest and easiest way to monitor your operational Flask web application is to integrate one of the many available fantastic hosted monitoring tools.

In this post we will quickly add Rollbar monitoring to catch errors and visualize our application is running properly. There are also many other great hosted monitoring tools, which you can check out on the monitoring page.

Our Tools

We can use either Python 2 or 3 to build this tutorial, but Python 3 is strongly recommended for all new applications. I used Python 3.6.2 to execute my code. We will also use the following application dependencies throughout the post:

If you need help getting your development environment configured before running this code, take a look at this guide for setting up Python 3 and Flask on Ubuntu 16.04 LTS.

All code in this blog post is available open source under the MIT license on GitHub under the monitor-flask-apps directory of the blog-code-examples repository. Use and abuse the source code as you desire for your own applications.

Installing Dependencies

Change into the directory where you keep your Python virtualenvs. Create a new virtual environment for this project using the following command.

python3 -m venv monitorflask

Activate the virtualenv.

source monitorflask/bin/activate

The command prompt will change after activating the virtualenv:

Activating our Python virtual environment on the command line.

Remember that you need to activate the virtualenv in every new terminal window where you want to use the virtualenv to run the project.

Flask, Rollbar and Blinker can now be installed into the now-activated virtualenv.

pip install flask==0.12.2 rollbar==0.13.12 blinker==1.4

Our required dependencies should be installed within our virtualenv after a short installation period. Look for output like the following to confirm everything worked.

Installing collected packages: blinker, itsdangerous, click, MarkupSafe, Jinja2, Werkzeug, Flask, idna, urllib3, chardet, certifi, requests, six, rollbar
  Running setup.py install for blinker ... done
  Running setup.py install for itsdangerous ... done
  Running setup.py install for MarkupSafe ... done
  Running setup.py install for rollbar ... done
Successfully installed Flask-0.12.2 Jinja2-2.9.6 MarkupSafe-1.0 Werkzeug-0.12.2 blinker-1.4 certifi-2017.4.17 chardet-3.0.4 click-6.7 idna-2.5 itsdangerous-0.24 requests-2.18.1 rollbar-0.13.12 six-1.10.0 urllib3-1.21.1

Now that we have our Python dependencies installed into our virtualenv we can create the initial version of our application.

Building Our Flask App

Create a folder for your project named monitor-flask-apps. Change into the folder and then create a file named app.py with the following code.

import re
from flask import Flask, render_template, Response
from werkzeug.exceptions import NotFound


app = Flask(__name__)
MIN_PAGE_NAME_LENGTH = 2


@app.route("/<string:page>/")
def show_page(page):
    try:
        valid_length = len(page) >= MIN_PAGE_NAME_LENGTH
        valid_name = re.match('^[a-z]+$', page.lower()) is not None
        if valid_length and valid_name:
            return render_template("{}.html".format(page))
        else:
            msg = "Sorry, couldn't find page with name {}".format(page)
            raise NotFound(msg)
    except:
        return Response("404 Not Found")


if __name__ == "__main__":
    app.run(debug=True)

The above application code has some standard Flask imports so we can create a Flask web app and render template files. We have a single function named show_page to serve a single Flask route. show_page checks if the URL path contains only lowercase alpha characters for a potential page name. If the page name can be found in the templates folder then the page is rendered, otherwise an exception is thrown that the page could not be found. We need to create at least one template file if our function is ever going to return a non-error reponse.

Save app.py and make a new subdirectory named templates under your project directory. Create a new file named battlegrounds.html and put the following Jinja2 template markup into it.

<!DOCTYPE html>
<html>
  <head>
    <title>You found the Battlegrounds GIF!</title>
  </head>
  <body>
    <h1>PUBG so good.</h1>
    <img src="https://media.giphy.com/media/3ohzdLMlhId2rJuLUQ/giphy.gif">
  </body>
</html>

The above Jinja2 template is basic HTML without any embedded template tags. The template creates a very plain page with a header description of "PUBG so good" and a GIF from this excellent computer game.

Time to run and test our code. Change into the base directory of your project where app.py file is located. Execute app.py using the python command as follows (make sure your virtualenv is still activated in the terminal where you are running this command):

python app.py

The Flask development server should start up and display a few lines of output.

Run the Flask development server locally.

What happens when we access the application running on localhost port 5000?

Testing our Flask application at the base URL receives an HTTP 404 error.

HTTP status 404 page not found, which is what we expected because we only defined a single route and it did not live at the base path.

We created a template named battlegrounds.html that should be accessible when we go to localhost:5000/battlegrounds/.

Testing our Flask application at /battlegrounds/ gets the proper template with a GIF.

The application successfully found the battlegrounds.html template but that is the only one available. What if we try localhost:5000/fullstackpython/?

If no template is found we receive a 500 error.

HTTP 500 error. That's no good.

The 404 and 500 errors are obvious to us right now because we are testing the application locally. However, what happens when the app is deployed and a user gets the error in their own web browser? They will typically quit out of frustration and you will never know what happened unless you add some error tracking and application monitoring.

We will now modify our code to add Rollbar to catch and report those errors that occur for our users.

Handling Errors

Head to Rollbar's homepage so we can add their hosted monitoring tools to our oft-erroring Flask app.

Rollbar homepage in the web browser.

Click the "Sign Up" button in the upper right-hand corner. Enter your email address, a username and the password you want on the sign up page.

Enter your basic account information on the sign up page.

After the sign up page you will see the onboarding flow where you can enter a project name and select a programming language. For project name enter "Battlegrounds" and select that you are monitoring a Python app.

Create a new project named 'Battlegrounds' and select Python as the programming language.

Press the "Continue" button at the bottom to move along. The next screen shows us a few quick instructions to add monitoring to our Flask application.

Set up your project using your server-side access token.

Let's modify our Flask application to test whether we can properly connect to Rollbar's service. Change app.py to include the following highlighted lines.

~~import os
import re
~~import rollbar
from flask import Flask, render_template, Response
from werkzeug.exceptions import NotFound


app = Flask(__name__)
MIN_PAGE_NAME_LENGTH = 2


~~@app.before_first_request
~~def add_monitoring():
~~    rollbar.init(os.environ.get('ROLLBAR_SECRET'))
~~    rollbar.report_message('Rollbar is configured correctly')


@app.route("/<string:page>/")
def show_page(page):
    try:
        valid_length = len(page) >= MIN_PAGE_NAME_LENGTH
        valid_name = re.match('^[a-z]+$', page.lower()) is not None
        if valid_length and valid_name:
            return render_template("{}.html".format(page))
        else:
            msg = "Sorry, couldn't find page with name {}".format(page)
            raise NotFound(msg)
    except:
        return Response("404 Not Found")


if __name__ == "__main__":
    app.run(debug=True)

We added a couple of new imports, os and rollbar. os allows us to grab environment variable values, such as our Rollbar secret key. rollbar is the library we installed earlier. The two lines below the Flask app instantiation are to initialize Rollbar using the Rollbar secret token and send a message to the service that it started correctly.

The ROLLBAR_SECRET token needs to be set in an environment variable. Save an quit the app.py. Run export ROLLBAR_SECRET='token here' on the command line where your virtualenv is activated. This token can be found on the Rollbar onboarding screen.

I typically store all my environment variables in a file like template.env and invoke it from the terminal using the . ./template.env command. Make sure to avoid committing your secret tokens to a source control repository, especially if the repository is public!

After exporting your ROLLBAR_SECRET key as an environment variable we can test that Rollbar is working as we run our application. Run it now using python:

python app.py

Back in your web browser press the "Done! Go to Dashboard" button. Don't worry about the "Report an Error" section code, we can get back to that in a moment.

If the event hasn't been reported yet we'll see a waiting screen like this one:

Waiting for data on the dashboard.

Once Flask starts up though, the first event will be populated on the dashboard.

First event populated on our dashboard for this project.

Okay, our first test event has been populated, but we really want to see all the errors from our application, not a test event.

Testing Error Handling

How do we make sure real errors are reported rather than just a simple test event? We just need to add a few more lines of code to our app.

import os
import re
import rollbar
~~import rollbar.contrib.flask
from flask import Flask, render_template, Response
~~from flask import got_request_exception
from werkzeug.exceptions import NotFound


app = Flask(__name__)
MIN_PAGE_NAME_LENGTH = 2


@app.before_first_request
def add_monitoring():
    rollbar.init(os.environ.get('ROLLBAR_SECRET'))
~~    ## delete the next line if you dont want this event anymore
    rollbar.report_message('Rollbar is configured correctly')
~~    got_request_exception.connect(rollbar.contrib.flask.report_exception, app)


@app.route("/<string:page>/")
def show_page(page):
    try:
        valid_length = len(page) >= MIN_PAGE_NAME_LENGTH
        valid_name = re.match('^[a-z]+$', page.lower()) is not None
        if valid_length and valid_name:
            return render_template("{}.html".format(page))
        else:
            msg = "Sorry, couldn't find page with name {}".format(page)
            raise NotFound(msg)
    except:
~~        rollbar.report_exc_info()
        return Response("404 Not Found")


if __name__ == "__main__":
    app.run(debug=True)

The above highlighted code modifies the application so it reports all Flask errors as well as our HTTP 404 not found issues that happen within the show_page function.

Make sure your Flask development server is running and try to go to localhost:5000/b/. You will receive an HTTP 404 exception and it will be reported to Rollbar. Next go to localhost:5000/fullstackpython/ and an HTTP 500 error will occur.

You should see an aggregation of errors as you test out these errors:

Rollbar dashboard showing aggregations of errors.

Woohoo, we finally have our Flask app reporting all errors that occur for any user back to the hosted Rollbar monitoring service!

What's Next?

We just learned how to catch and handle errors with Rollbar as a hosted monitoring platform in a simple Flask application. Next you will want to add monitoring to your more complicated web apps. You can also check out some of Rollbar's more advanced features such as:

There is a lot more to learn about web development and deployments so keep learning by reading up on Flask and other web frameworks such as Django, Pyramid and Sanic. You can also learn more about integrating Rollbar with Python applications via their Python documentation.

Questions? Let me know via a GitHub issue ticket on the Full Stack Python repository, on Twitter @fullstackpython or @mattmakai.

See something wrong in this blog post? Fork this page's source on GitHub and submit a pull request with a fix.

July 23, 2017 04:00 AM