skip to navigation
skip to content

Planet Python

Last update: September 29, 2017 01:47 PM

September 28, 2017


Tomasz Früboes

Flask and bower – download jQuery, others automatically

In last couple of weeks, I experimented a lot with flask and web development. Since this a learning phase, I tend to setup lots of small projects, where a couple of external libraries (such as bootstrap or jQuery) usually need to be downloaded. As I am rather lazy, using an external tool (bower) to download… Continue reading

September 28, 2017 07:46 PM


Stack Abuse

Read a File Line-by-Line in Python

Introduction

Over the course of my working life I have had the opportunity to use many programming concepts and technologies to do countless things. Some of these things involve relatively low value fruits of my labor, such as automating the error prone or mundane like report generation, task automation, and general data reformatting. Others have been much more valuable, such as developing data products, web applications, and data analysis and processing pipelines. One thing that is notable about nearly all of these projects is the need to simply open a file, parse its contents, and do something with it.

However, what do you do when the file you are trying to consume is quite large? Say the file is several GB of data or larger? Again, this has been another frequent aspect of my programming career, which has primarily been spent in the BioTech sector where files, up to a TB in size, are quite common.

The answer to this problem is to read in chunks of a file at a time, process it, then free it from memory so you can pull in and process another chunk until the whole massive file has been processed. While it is up to the programmer to determine a suitable chunk size, perhaps the most commonly used is simply a line of a file at a time.

This is what we will be discussing in this article - memory management by reading in a file line-by-line in Python. The code used in this article can be found in the following GitHub repo.

Basic File IO in Python

Being a great general purpose programming language, Python has a number of very useful file IO functionality in its standard library of built-in functions and modules. The built-in open() function is what you use to open a file object for either reading or writing purposes.

fp = open('path/to/file.txt', 'r')  

The open() function takes in multiple arguments. We will be focusing on the first two, with the first being a positional string parameter representing the path to the file that should be opened. The second optional parameter is also a string which specifies the mode of interaction you intend for the file object being returned by the function call. The most common modes are listed in the table below, with the default being 'r' for reading.

Mode Description
`r` Open for reading plain text
`w` Open for writing plain text
`a` Open an existing file for appending plain text
`rb` Open for reading binary data
`wb` Open for writing binary data

Once you have written or read all of the desired data for a file object you need to close the file so that resources can be reallocated on the operating system that the code is running on.

fp.close()  

You will often see many code snippets on the internet or in programs in the wild that do not explicitly close file objects that have been generated in accord with the example above. It is always good practice to close a file object resource, but many of us either are too lazy or forgetful to do so or think we are smart because documentation suggests that an open file object will self close once a process terminates. This is not always the case.

Instead of harping on how important it is to always call close() on a file object, I would like to provide an alternate and more elegant way to open a file object and ensure that the Python interpreter cleans up after us :)

with open('path/to/file.txt') as fp:  
    # do stuff with fp

By simply using the with keyword (introduced in python 2.5) to wrap our code for opening a file object, the internals of Python will do something similar to the following code to ensure that no matter what the file object is closed after use.

try:  
    fp = open('path/to/file.txt')
    # do stuff here
finally:  
    fp.close()

Reading Line by Line

Now, lets get to actually reading in a file. The file object returned from open() has three common explicit methods (read, readline, and readlines) to read in data and one more implicit way.

The read method will read in all the data into one text string. This is useful for smaller files where you would like to do text manipulation on the entire file, or whatever else suits you. Then there is readline which is one useful way to only read in individual line incremental amounts at a time and return them as strings. The last explicit method, readlines, will read all the lines of a file and return them as a list of strings.

As mentioned earlier, you can use these methods to only load small chunks of the file at a time. To do this with these methods, you can pass a parameter to them telling how many bytes to load at a time. This is the only argument these methods accept.

One implementation for reading a text file one line at a time might look like the following code. Note that for the remainder of this article I will be demonstrating how to read in the text of the book The "Iliad of Homer" which can be found at gutenberg.org, as well as in the GitHub repo where the code is for this article.

In readline.py you will find the following code. In the terminal if you run $ python readline.py you can see the output of reading all the lines of the Iliad, as well as their line numbers.

filepath = 'Iliad.txt'  
with open(filepath) as fp:  
   line = fp.readline()
   cnt = 1
   while line:
       print("Line {}: {}".format(cnt, line.strip()))
       line = fp.readline()
       cnt += 1

The above code snippet opens a file object stored as a variable called fp, then reads in a line at a time by calling readline on that file object iteratively in a while loop and prints it to the console.

Running this code you should see something like the following:

$ python forlinein.py 
Line 0: BOOK I  
Line 1:  
Line 2: The quarrel between Agamemnon and Achilles--Achilles withdraws  
Line 3: from the war, and sends his mother Thetis to ask Jove to help  
Line 4: the Trojans--Scene between Jove and Juno on Olympus.  
Line 5:  
Line 6: Sing, O goddess, the anger of Achilles son of Peleus, that brought  
Line 7: countless ills upon the Achaeans. Many a brave soul did it send  
Line 8: hurrying down to Hades, and many a hero did it yield a prey to dogs and  
Line 9: vultures, for so were the counsels of Jove fulfilled from the day on  
...

While this is perfectly fine, there is one final way that I mentioned fleetingly earlier, which is less explicit but a bit more elegant, which I greatly prefer. This final way of reading in a file line-by-line includes iterating over a file object in a for loop assigning each line to a special variable called line. The above code snippet can be replicated in the following code, which can be found in the Python script forlinein.py:

filepath = 'Iliad.txt'  
with open(filepath) as fp:  
   for cnt, line in enumerate(fp):
       print("Line {}: {}".format(cnt, line))

In this implementation we are taking advantage of a built-in Python functionality that allows us to iterate over the file object implicitly using a for loop in combination of using the iterable object fp. Not only is this simpler to read but it also takes fewer lines of code to write, which is always a best practice worthy of following.

An Example Application

I would be remiss to write an application on how to consume information in a text file without demonstrating at least a trivial usage of how to use such a worthy skill. That being said, I will be demonstrating a small application that can be found in wordcount.py, which calculates the frequency of each word present in "The Iliad of Homer" used in previous examples.

import sys  
import os

def main():  
   filepath = sys.argv[1]

   if not os.path.isfile(filepath):
       print("File path {} does not exist. Exiting...".format(filepath))
       sys.exit()

   bag_of_words = {}
   with open(filepath) as fp:
       cnt = 0
       for line in fp:
           print("line {} contents {}".format(cnt, line))
           record_word_cnt(line.strip().split(' '), bag_of_words)
           cnt += 1
   sorted_words = order_bag_of_words(bag_of_words, desc=True)
   print("Most frequent 10 words {}".format(sorted_words[:10]))

def order_bag_of_words(bag_of_words, desc=False):  
   words = [(word, cnt) for word, cnt in bag_of_words.items()]
   return sorted(words, key=lambda x: x[1], reverse=desc)

def record_word_cnt(words, bag_of_words):  
   for word in words:
       if word != '':
           if word.lower() in bag_of_words:
               bag_of_words[word.lower()] += 1
           else:
               bag_of_words[word.lower()] = 0

if __name__ == '__main__':  
   main()

The above code represents a command line python script that expects a file path passed in as an argument. The script uses the os module to make sure that the passed in file path is a file that exists on the disk. If the path exists then each line of the file is read and passed to a function called record_word_cnt as a list of strings, delimited the spaces between words as well as a dictionary called bag_of_words. The record_word_cnt function counts each instance of every word and records it in the bag_of_words dictionary.

Once all the lines of the file are read and recorded in the bag_of_words dictionary, then a final function call to order_bag_of_words is called, which returns a list of tuples in (word, word count) format, sorted by word count. The returned list of tuples is used to print the most frequently occurring 10 words.

Conclusion

So, in this article we have explored ways to read a text file line-by-line in two ways, including a way that I feel is a bit more Pythonic (this being the second way demonstrated in forlinein.py). To wrap things up I presented a trivial application that is potentially useful for reading in and preprocessing data that could be used for text analytics or sentiment analysis.

As always I look forward to your comments and I hope you can use what has been discussed to develop exciting and useful applications.

September 28, 2017 06:20 PM


Mike Driscoll

wxPython: All About Accelerators

The wxPython toolkit supports using keyboard shortcuts via the concept of Accelerators and Accelerator Tables. You can also bind directly to key presses, but in a lot of cases, you will want to go with Accelerators. The accelerator gives to the ability to add a keyboard shortcut to your application, such as the ubiquitous “CTRL+S” that most applications use for saving a file. As long as your application has focus, this keyboard shortcut can be added trivially.

Note that you will normally add an accelerator table to your wx.Frame instance. If you happen to have multiple frames in your application, then you may need to add an accelerator table to multiple frames depending on your design.

Let’s take a look at a simple example:

import wx
 
class MyForm(wx.Frame):
 
    def __init__(self):
        wx.Frame.__init__(self, None, title="Accelerator Tutorial", 
                          size=(500,500))
 
        # Add a panel so it looks the correct on all platforms
        panel = wx.Panel(self, wx.ID_ANY)
 
        randomId = wx.NewId()
        self.Bind(wx.EVT_MENU, self.onKeyCombo, id=randomId)
        accel_tbl = wx.AcceleratorTable([(wx.ACCEL_CTRL,  ord('Q'), 
                                          randomId )])
        self.SetAcceleratorTable(accel_tbl)
 
    def onKeyCombo(self, event):
        """"""
        print "You pressed CTRL+Q!"
 
# Run the program
if __name__ == "__main__":
    app = wx.App(False)
    frame = MyForm()
    frame.Show()
    app.MainLoop()

This can end up looking a bit ugly if you have a lot of keyboard shortcuts that you need to add to your application as you end upw tih a list of tuples that just looks kind of odd. You will find this way or writing an AcceleratorTable more often than not though. However there are other ways to add entries to your AcceleratorTable. Let’s take a look at an example from wxPython’s documentation:

entries = [wx.AcceleratorEntry() for i in xrange(4)]
 
entries[0].Set(wx.ACCEL_CTRL, ord('N'), ID_NEW_WINDOW)
entries[1].Set(wx.ACCEL_CTRL, ord('X'), wx.ID_EXIT)
entries[2].Set(wx.ACCEL_SHIFT, ord('A'), ID_ABOUT)
entries[3].Set(wx.ACCEL_NORMAL, wx.WXK_DELETE, wx.ID_CUT)
 
accel = wx.AcceleratorTable(entries)
frame.SetAcceleratorTable(accel)

Here we create a list of four wx.AcceleratorEntry() objects using a list comprehension. Then we access each of the entries in the list using the Python list’s index to call each entry’s Set method. The rest of the code is pretty similar to what you saw before. Let’s take a moment to make this code actually runnable:

import wx
 
class MyForm(wx.Frame):
 
    def __init__(self):
        wx.Frame.__init__(self, None, title="AcceleratorEntry Tutorial", 
                          size=(500,500))
 
        # Add a panel so it looks the correct on all platforms
        panel = wx.Panel(self, wx.ID_ANY)
 
        exit_menu_item = wx.MenuItem(id=wx.NewId(), text="Exit",
                               helpString="Exit the application")
        about_menu_item = wx.MenuItem(id=wx.NewId(), text='About')
 
        ID_NEW_WINDOW = wx.NewId()
        ID_ABOUT = wx.NewId()
 
        self.Bind(wx.EVT_MENU, self.on_new_window, id=ID_NEW_WINDOW)
        self.Bind(wx.EVT_MENU, self.on_about, id=ID_ABOUT)
 
        entries = [wx.AcceleratorEntry() for i in range(4)]
 
        entries[0].Set(wx.ACCEL_CTRL, ord('N'),
                       ID_NEW_WINDOW, exit_menu_item)
        entries[1].Set(wx.ACCEL_CTRL, ord('X'), wx.ID_EXIT)
        entries[2].Set(wx.ACCEL_SHIFT, ord('A'), ID_ABOUT, 
                       about_menu_item)
        entries[3].Set(wx.ACCEL_NORMAL, wx.WXK_DELETE, wx.ID_CUT)
 
        accel_tbl = wx.AcceleratorTable(entries)
        self.SetAcceleratorTable(accel_tbl)
 
    def on_new_window(self, event):
        """"""
        print("You pressed CTRL+N!")
 
    def on_about(self, event):
        print('You pressed SHIFT+A')
 
 
# Run the program
if __name__ == "__main__":
    app = wx.App(False)
    frame = MyForm()
    frame.Show()
    app.MainLoop()

First of all, I want to note that I don’t have all the accelerators hooked up. For example, “CTRL+X” won’t actually exit the program. But I did go ahead and hook up “CTRL+N” and “SHIFT+A”. Try running the code and see how it works.

You can also be slightly more explicit and create your AcceleratorEntry() objects one by one instead of using a list comprehension. Let’s modify our code a bit and see how that works:

import wx
 
class MyForm(wx.Frame):
 
    def __init__(self):
        wx.Frame.__init__(self, None, 
                          title="AcceleratorEntry Tutorial", 
                          size=(500,500))
 
        # Add a panel so it looks the correct on all platforms
        panel = wx.Panel(self, wx.ID_ANY)
 
        exit_menu_item = wx.MenuItem(id=wx.NewId(), text="Exit",
                               helpString="Exit the application")
        about_menu_item = wx.MenuItem(id=wx.NewId(), text='About')
 
        ID_NEW_WINDOW = wx.NewId()
        ID_ABOUT = wx.NewId()
 
        self.Bind(wx.EVT_MENU, self.on_new_window, id=ID_NEW_WINDOW)
        self.Bind(wx.EVT_MENU, self.on_about, id=ID_ABOUT)
 
        entry_one = wx.AcceleratorEntry(wx.ACCEL_CTRL, ord('N'),
                                        ID_NEW_WINDOW, 
                                        exit_menu_item)
        entry_two = wx.AcceleratorEntry(wx.ACCEL_SHIFT, ord('A'), 
                                        ID_ABOUT, 
                                        about_menu_item)
        entries = [entry_one, entry_two]
 
        accel_tbl = wx.AcceleratorTable(entries)
        self.SetAcceleratorTable(accel_tbl)
 
 
    def on_new_window(self, event):
        """"""
        print("You pressed CTRL+N!")
 
    def on_about(self, event):
        print('You pressed SHIFT+A')
 
# Run the program
if __name__ == "__main__":
    app = wx.App(False)
    frame = MyForm()
    frame.Show()
    app.MainLoop()

Frankly I think like this version the best as it’s the most explicit. The “Zen of Python” is always about advocating doing things explicitly over implicitly so I think this also follows that paradigm well.


Wrapping Up

Now you know a couple of different ways to create keyboard shortcuts (accelerators) for your application. They are very handy and can enhance your application’s usefulness.


Related Reading

September 28, 2017 04:15 PM


Semaphore Community

Dockerizing a Python Django Web Application

This article is brought with ❤ to you by Semaphore.

Introduction

This article will cover building a simple 'Hello World'-style web application written in Django and running it in the much talked about and discussed Docker. Docker takes all the great aspects of a traditional virtual machine, e.g. a self contained system isolated from your development machine, and removes many of the drawbacks such as system resource drain, setup time, and maintenance.

When building web applications, you have probably reached a point where you want to run your application in a fashion that is closer to your production environment. Docker allows you to set up your application runtime in such a way that it runs in exactly the same manner as it will in production, on the same operating system, with the same environment variables, and any other configuration and setup you require.

By the end of the article you'll be able to:

What is Docker, Anyway?

Docker's homepage describes Docker as follows:

"Docker is an open platform for building, shipping and running distributed applications. It gives programmers, development teams, and operations engineers the common toolbox they need to take advantage of the distributed and networked nature of modern applications."

Put simply, Docker gives you the ability to run your applications within a controlled environment, known as a container, built according to the instructions you define. A container leverages your machines resources much like a traditional virtual machine (VM). However, containers differ greatly from traditional virtual machines in terms of system resources. Traditional virtual machines operate using Hypervisors, which manage the virtualization of the underlying hardware to the VM. This means they are large in terms of system requirements.

Containers operate on a shared Linux operating system base and add simple instructions on top to execute and run your application or process. The difference being that Docker doesn't require the often time-consuming process of installing an entire OS to a virtual machine such as VirtualBox or VMWare. Once Docker is installed, you create a container with a few commands and then execute your applications on it via the Dockerfile. Docker manages the majority of the operating system virtualization for you, so you can get on with writing applications and shipping them as you require in the container you have built. Furthermore, Dockerfiles can be shared for others to build containers and extend the instructions within them by basing their container image on top of an existing one. The containers are also highly portable and will run in the same manner regardless of the host OS they are executed on. Portability is a massive plus side of Docker.

Prerequisites

Before you begin this tutorial, ensure the following is installed to your system:

Setting Up a Django web application

Starting a Django application is easy, as the Django dependency provides you with a command line tool for starting a project and generating some of the files and directory structure for you. To start, create a new folder that will house the Django application and move into that directory.

$ mkdir project
$ cd project

Once in this folder, you need to add the standard Python project dependencies file which is usually named requirements.txt, and add the Django and Gunicorn dependency to it. Gunicorn is a production standard web server, which will be used later in the article. Once you have created and added the dependencies, the file should look like this:

$ cat requirements.txt
Django==1.9.4
gunicorn==19.6.0

With the Django dependency added, you can then install Django using the following command:

$ pip install -r requirements.txt

Once installed, you will find that you now have access to the django-admin command line tool, which you can use to generate the project files and directory structure needed for the simple "Hello, World!" application.

$ django-admin startproject helloworld

Let's take a look at the project structure the tool has just created for you:

.
├── helloworld
│   ├── helloworld
│   │   ├── __init__.py
│   │   ├── settings.py
│   │   ├── urls.py
│   │   └── wsgi.py
│   └── manage.py
└── requirements.txt

You can read more about the structure of Django on the official website. django-admin tool has created a skeleton application. You control the application for development purposes using the manage.py file, which allows you to start the development test web server for example:

$ cd helloworld
$ python manage.py runserver

The other key file of note is the urls.py, which specifies what URL's route to which view. Right now, you will only have the default admin URL which we won't be using in this tutorial. Lets add a URL that will route to a view returning the classic phrase "Hello, World!".

First, create a new file called views.py in the same directory as urls.py with the following content:

from django.http import HttpResponse

def index(request):
    return HttpResponse("Hello, world!")

Now, add the following URL url(r'', 'helloworld.views.index') to the urls.py, which will route the base URL of / to our new view. The contents of the urls.py file should now look as follows:

from django.conf.urls import url
from django.contrib import admin

urlpatterns = [
    url(r'^admin/', admin.site.urls),
    url(r'', 'helloworld.views.index'),
]

Now, when you execute the python manage.py runserver command and visit http://localhost:8000 in your browser, you should see the newly added "Hello, World!" view.

The final part of our project setup is making use of the Gunicorn web server. This web server is robust and built to handle production levels of traffic, whereas the included development server of Django is more for testing purposes on your local machine only. Once you have dockerized the application, you will want to start up the server using Gunicorn. This is much simpler if you write a small startup script for Docker to execute. With that in mind, let's add a start.sh bash script to the root of the project, that will start our application using Gunicorn.

#!/bin/bash

# Start Gunicorn processes
echo Starting Gunicorn.
exec gunicorn helloworld.wsgi:application \
    --bind 0.0.0.0:8000 \
    --workers 3

The first part of the script writes "Starting Gunicorn" to the command line to show us that it is starting execution. The next part of the script actually launches Gunicorn. You use exec here so that the execution of the command takes over the shell script, meaning that when the Gunicorn process ends so will the script, which is what we want here.

You then pass the gunicorn command with the first argument of helloworld.wsgi:application. This is a reference to the wsgi file Django generated for us and is a Web Server Gateway Interface file which is the Python standard for web applications and servers. Without delving too much into WSGI, the file simply defines the application variable, and Gunicorn knows how to interact with the object to start the web server.

You then pass two flags to the command, bind to attach the running server to port 8000, which you will use to communicate with the running web server via HTTP. Finally, you specify workers which are the number of threads that will handle the requests coming into your application. Gunicorn recommends this value to be set at (2 x $num_cores) + 1. You can read more on configuration of Gunicorn in their documentation.

Finally, make the script executable, and then test if it works by changing directory into the project folder helloworld and executing the script as shown here. If everything is working fine, you should see similar output to the one below, be able to visit http://localhost:8000 in your browser, and get the "Hello, World!" response.

$ chmod +x start.sh
$ cd helloworld
$ ../start.sh
Starting Gunicorn.
[2016-06-26 19:43:28 +0100] [82248] [INFO]
Starting gunicorn 19.6.0
[2016-06-26 19:43:28 +0100] [82248] [INFO]
Listening at: http://0.0.0.0:8000 (82248)
[2016-06-26 19:43:28 +0100] [82248] [INFO]
Using worker: sync
[2016-06-26 19:43:28 +0100] [82251] [INFO]
Booting worker with pid: 82251
[2016-06-26 19:43:28 +0100] [82252] [INFO]
Booting worker with pid: 82252
[2016-06-26 19:43:29 +0100] [82253] [INFO]
Booting worker with pid: 82253

Dockerizing the Application

You now have a simple web application that is ready to be deployed. So far, you have been using the built-in development web server that Django ships with the web framework it provides. It's time to set up the project to run the application in Docker using a more robust web server that is built to handle production levels of traffic.

Installing Docker

One of the key goals of Docker is portability, and as such is able to be installed on a wide variety of operating systems.

For this tutorial, you will look at installing Docker Machine on MacOS. The simplest way to achieve this is via the Homebrew package manager. Instal Homebrew and run the following:

$ brew update && brew upgrade --all && brew cleanup && brew prune
$ brew install docker-machine

With Docker Machine installed, you can use it to create some virtual machines and run Docker clients. You can run docker-machine from your command line to see what options you have available. You'll notice that the general idea of docker-machine is to give you tools to create and manage Docker clients. This means you can easily spin up a virtual machine and use that to run whatever Docker containers you want or need on it.

You will now create a virtual machine based on VirtualBox that will be used to execute your Dockerfile, which you will create shortly. The machine you create here should try to mimic the machine you intend to run your application on in production. This way, you should not see any differences or quirks in your running application neither locally nor in a deployed environment.

Create your Docker Machine using the following command:

$ docker-machine create development --driver virtualbox
--virtualbox-disk-size "5000" --virtualbox-cpu-count 2
--virtualbox-memory "4096"

This will create your machine and output useful information on completion. The machine will be created with 5GB hard disk, 2 CPU's and 4GB of RAM.

To complete the setup, you need to add some environment variables to your terminal session to allow the Docker command to connect the machine you have just created. Handily, docker-machine provides a simple way to generate the environment variables and add them to your session:

$ docker-machine env development
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://123.456.78.910:1112"
export DOCKER_CERT_PATH="/Users/me/.docker/machine/machines/development"
export DOCKER_MACHINE_NAME="development"
# Run this command to configure your shell:
# eval "$(docker-machine env development)"

Complete the setup by executing the command at the end of the output:

$(docker-machine env development)

Execute the following command to ensure everything is working as expected.

$ docker images
REPOSITORY   TAG   IMAGE  ID   CREATED   SIZE

You can now dockerize your Python application and get it running using the docker-machine.

Writing the Dockerfile

The next stage is to add a Dockerfile to your project. This will allow Docker to build the image it will execute on the Docker Machine you just created. Writing a Dockerfile is rather straightforward and has many elements that can be reused and/or found on the web. Docker provides a lot of the functions that you will require to build your image. If you need to do something more custom on your project, Dockerfiles are flexible enough for you to do so.

The structure of a Dockerfile can be considered a series of instructions on how to build your container/image. For example, the vast majority of Dockerfiles will begin by referencing a base image provided by Docker. Typically, this will be a plain vanilla image of the latest Ubuntu release or other Linux OS of choice. From there, you can set up directory structures, environment variables, download dependencies, and many other standard system tasks before finally executing the process which will run your web application.

Start the Dockerfile by creating an empty file named Dockerfile in the root of your project. Then, add the first line to the Dockerfile that instructs which base image to build upon. You can create your own base image and use that for your containers, which can be beneficial in a department with many teams wanting to deploy their applications in the same way.

# Dockerfile

# FROM directive instructing base image to build upon
FROM python:2-onbuild

It's worth noting that we are using a base image that has been created specifically to handle Python 2.X applications and a set of instructions that will run automatically before the rest of your Dockerfile. This base image will copy your project to /usr/src/app, copy your requirements.txt and execute pip install against it. With these tasks taken care of for you, your Dockerfile can then prepare to actually run your application.

Next, you can copy the start.sh script written earlier to a path that will be available to you in the container to be executed later in the Dockerfile to start your server.

# COPY startup script into known file location in container
COPY start.sh /start.sh

Your server will run on port 8000. Therefore, your container must be set up to allow access to this port so that you can communicate to your running server over HTTP. To do this, use the EXPOSE directive to make the port available:

# EXPOSE port 8000 to allow communication to/from server
EXPOSE 8000

The final part of your Dockerfile is to execute the start script added earlier, which will leave your web server running on port 8000 waiting to take requests over HTTP. You can execute this script using the CMD directive.

# CMD specifcies the command to execute to start the server running.
CMD ["/start.sh"]
# done!

With all this in place, your final Dockerfile should look something like this:

# Dockerfile

# FROM directive instructing base image to build upon
FROM python:2-onbuild

# COPY startup script into known file location in container
COPY start.sh /start.sh

# EXPOSE port 8000 to allow communication to/from server
EXPOSE 8000

# CMD specifcies the command to execute to start the server running.
CMD ["/start.sh"]
# done!

You are now ready to build the container image, and then run it to see it all working together.

Building and Running the Container

Building the container is very straight forward once you have Docker and Docker Machine on your system. The following command will look for your Dockerfile and download all the necessary layers required to get your container image running. Afterwards, it will run the instructions in the Dockerfile and leave you with a container that is ready to start.

To build your container, you will use the docker build command and provide a tag or a name for the container, so you can reference it later when you want to run it. The final part of the command tells Docker which directory to build from.

$ cd <project root directory>
$ docker build -t davidsale/dockerizing-python-django-app .

Sending build context to Docker daemon 237.6 kB
Step 1 : FROM python:2-onbuild
# Executing 3 build triggers...
Step 1 : COPY requirements.txt /usr/src/app/
 ---> Using cache
Step 1 : RUN pip install --no-cache-dir -r requirements.txt
 ---> Using cache
Step 1 : COPY . /usr/src/app
 ---> 68be8680cbc4
Removing intermediate container 75ed646abcb6
Step 2 : COPY start.sh /start.sh
 ---> 9ef8e82c8897
Removing intermediate container fa73f966fcad
Step 3 : EXPOSE 8000
 ---> Running in 14c752364595
 ---> 967396108654
Removing intermediate container 14c752364595
Step 4 : WORKDIR helloworld
 ---> Running in 09aabb677b40
 ---> 5d714ceea5af
Removing intermediate container 09aabb677b40
Step 5 : CMD /start.sh
 ---> Running in 7f73e5127cbe
 ---> 420a16e0260f
Removing intermediate container 7f73e5127cbe
Successfully built 420a16e0260f

In the output, you can see Docker processing each one of your commands before outputting that the build of the container is complete. It will give you a unique ID for the container, which can also be used in commands alongside the tag.

The final step is to run the container you have just built using Docker:

$ docker run -it -p 8000:8000 davidsale/djangoapp1
Starting Gunicorn.
[2016-06-26 19:24:11 +0000] [1] [INFO]
Starting gunicorn 19.6.0
[2016-06-26 19:24:11 +0000] [1] [INFO]
Listening at: http://0.0.0.0:9077 (1)
[2016-06-26 19:24:11 +0000] [1] [INFO]
Using worker: sync
[2016-06-26 19:24:11 +0000] [11] [INFO]
Booting worker with pid: 11
[2016-06-26 19:24:11 +0000] [12] [INFO]
Booting worker with pid: 12
[2016-06-26 19:24:11 +0000] [17] [INFO]
Booting worker with pid: 17

The command tells Docker to run the container and forward the exposed port 8000 to port 8000 on your local machine. After you run this command, you should be able to visit http://localhost:8000 in your browser to see the "Hello, World!" response. If you were running on a Linux machine, that would be the case. However, if running on MacOS, then you will need to forward the ports from VirtualBox, which is the driver we use in this tutorial so that they are accessible on your host machine.

$ VBoxManage controlvm "development" natpf1
  "tcp-port8000,tcp,,8000,,8000";

This command modifies the configuration of the virtual machine created using docker-machine earlier to forward port 8000 to your host machine. You can run this command multiple times changing the values for any other ports you require.

Once you have done this, visit http://localhost:8000 in your browser. You should be able to visit your dockerized Python Django application running on a Gunicorn web server, ready to take thousands of requests a second and ready to be deployed on virtually any OS on planet using Docker.

Next Steps

After manually verifying that the application is behaving as expected in Docker, the next step is the deployment. You can use Semaphore's Docker platform for automating this process.

Continuous Integration and Deployment for Docker projects on Semaphore

As a first step you need to create a free Semaphore account. Then, connect your Docker project repository to your new account. Semaphore will recognize that you're using Docker, and will automatically recommend the Docker platform for it.

The last step is to specify commands to build and run your Docker images:

docker build <your-project> .
docker run <your-project>

Semaphore will execute these commands on every git push.

Semaphore also makes it easy to push your images to various Docker container registries. To learn more about getting the most out of Docker on Semaphore, check out our Docker documentation pages.

Conclusion

In this tutorial, you have learned how to build a simple Python Django web application, wrap it in a production grade web server, and created a Docker container to execute your web server process.

If you enjoyed working through this article, feel free to share it and if you have any questions or comments leave them in the section below. We will do our best to answer them, or point you in the right direction.

This article is brought with ❤ to you by Semaphore.

September 28, 2017 01:09 PM


EuroPython

EuroPython 2017: Videos for Monday available online

We are pleased to announce the first batch of cut videos for EuroPython 2017. 

To see the videos, please head over to our EuroPython YouTube channel and select the “EuroPython 2017″ playlist.

Here’s our brand new teaser video for the conference:

In the coming weeks, we will release the other videos, in batches of one conference day per week.

Enjoy,

EuroPython 2017 Team
EuroPython Society
EuroPython 2017 Conference

September 28, 2017 12:10 PM


hypothesis.works articles

The Threshold Problem

In my last post I mentioned the problem of bug slippage: When you start with one bug, reduce the test case, and end up with another bug.

I’ve run into another related problem twice now, and it’s not one I’ve seen talked about previously.

The problem is this: Sometimes shrinking makes a bug seem much less interesting than it actually is.

Read more...

September 28, 2017 11:00 AM


"Fredrik Håård's Blaag"

Provisioning Elasticsearch and Kibana with Terraform

Fair warning up front: this is not a Terraform, AWS, or Elasticsearch tutorial. You'll need to know a bit or read the docs to apply the examples.

When I wanted to add the AWS version of ELK (Elasticsearch, Logstash, Kibana) which is Elasticsearch, Cloudwatch and Kibana, I hit a road block that Terraform did not support provisioning the actual streaming of logs from Cloudwatch to Elasticsearch naively. Googling lead me approximately nowhere, and I had to devise a solution from scratch.

Automating setting up Cloudwatch once hosts are set up is reasonably straightforward using e.g. Ansible, so it's the Lambda function to parse messages and send them to Elasticsearch that is the problem. It's super simple to set up if you follow the docs and click through the AWS console, but there are few hints on how you would automate it reliably.

Terraform supports setting up Cloudwatch log subscription filters and Elasticsearch clusters/domains as well as Lambda functions, and once you know what to put in the Lambda, the process is just plain old Terraform. While the solution is not very sophisticated, it works for me ^(tm) and I belive it might work for others as well.

As so often in software, the solution is to copy something that works. In this case, I created a subscription filter from some random Cloudwatch group to a freshly created Elasticsearch domain on a test account, and then checked the resulting (NodeJS) code for the generated Lambda function. To my delight, the code has almost no environment- or target specific contents except for the Elasticsearch endpoint itself, so there's not much that needs to be done to adapt it for any setup.

In this case, I simply replaced references to the endpoint with references to a environment variable ES_ENDPOINT, which I then inject when creating the lambda function.

Putting the resulting index.js (or whatever) into a zip file, we can then provision it using Terraform:


resource "aws_lambda_function" "logs-to-es-lambda" {
  filename         = "files/es_logs_lambda.zip"
  description      = "CloudWatch Logs to Amazon ES"
  function_name    = "Logs_To_Elasticsearch"
  role             = "${aws_iam_role.my-es-execution-role.arn}"
  handler          = "index.handler"
  source_code_hash = '${base64sha256(file("files/es_logs_lambda.zip"))}'
  runtime          = "nodejs4.3"
  timeout          = 60

  environment {
    variables = {
      ES_ENDPOINT = "${aws_elasticsearch_domain.my-es-domain.endpoint}"
    }
  }
}

With the Lambda function in place, we can create a log subscription filter to send events to it:


resource "aws_lambda_permission" "cloudwatch-lambda-permission" {
  statement_id = "allow-cloudwatch-lambda"
  action = "lambda:InvokeFunction"
  function_name = "${aws_lambda_function.logs-to-es-lambda.arn}"
  principal = "logs.${var.my-aws-region}.amazonaws.com"
  source_arn = "${aws_cloudwatch_log_group.my-log-group.arn}"
}


resource "aws_cloudwatch_log_subscription_filter" "logs-subscription" {
  name            = "ElasticsearchStream-logs"
  depends_on = ["aws_lambda_permission.cloudwatch-lambda-permission"]
  log_group_name  = "${aws_cloudwatch_log_group.my-log-group.name}"
  filter_pattern  = "[timestamp, level, thread, name, message]"
  destination_arn = "${aws_lambda_function.logs-to-es-lambda.arn}"
}

The filter pattern follows the same rules as in the AWS console, and I'd generally use the console and some example data to try it out since I find the documentation incredibly hard to find and parse.

For a full example, you'd of course need to provision the role (aws_iam_role), domain (aws_elasticsearch_domain) and stream some interesting logs to the log group! In addition, the first time I did this I hooked a quick Python hack to inject the endpoint and zip the JavaScript source on the fly, but I don't think that makes much sense in retrospect. The last couple of projects I've done this, I've simply committed the ZIP to the source repository instead.

September 28, 2017 07:39 AM

September 27, 2017


Stack Abuse

Command Line Arguments in Python

Overview

Computer programs are written with a specific purpose in mind. Tools on UNIX/Linux systems follow the idea of specialization - one tool for one task, but to do it as perfect as possible, then. Nethertheless, as with other tools, this allows you to combine single programs, and to create powerful tool chains.

With the help of command line arguments that are passed to programs, you can deal with much more specific use cases. Command line arguments allow you to enable programs to act in a certain way, for example to output additional information, or to read data from a specified source, and to interpret this data in a desired format.

In general, operating systems accept arguments in a certain notation, for example:

These different approaches exist due to historical reasons. Many programs on UNIX-like systems support either the UNIX way, or the GNU way, or both. The UNIX notation is mostly used with single letter options while GNU presents a more readable options list particularly useful to document what is running.

Keep in mind that both the name and the meaning of an argument are specific to a program - there is no general definition, but a few conventions like --help for further information on the usage of the tool. As the developer of a Python script you decide which arguments are valid, and what they stand for, actually. This requires proper evaluation. Read on how to do it using Python.

Handling command line arguments with Python

Python 3 supports four different ways of handling command line arguments. The oldest one is the sys module. In terms of names, and its usage, it relates directly to the C library (libc). The second way is the getopt module which handles both short, and long options, including the evaluation of the parameter values.

Furthermore, two lesser-known ways exist. This is the argparse module which is derived from the optparse module available up to Python 2.7, formerly, and the docopt module is available from GitHub. All the modules are fully documented and worth reading.

The sys Module

This is a basic module that was shipped with the Python distribution from the early days on. It has a quite similar approach as the C library using argc/argv to access the arguments. The sys module implements the command line arguments in a simple list structure named sys.argv.

Each list element represents a single argument. The first one -- sys.argv[0] -- is the name of the Python script. The other list elements -- sys.argv[1] to sys.argv[n] -- are the command line arguments 2 to n. As a delimiter between the arguments, a space is in use. Argument values that contain a space in it have to be quoted, accordingly.

The equivalent of argc is just the number of elements in the list. To obtain this value use the Python len() operator. Example 2 will explain this in detail.

Example 1

In this first example, we determine the way we were called. This information is kept in the first command line argument, indexed with 0. Listing 1 displays how you obtain the name of your Python script.

Example 1: Determine the name of the Python script

import sys

print ("the script has the name %s" % (sys.argv[0])  

Save this code in a file named arguments-programname.py, and then call it as shown in Listing 1. The output is as follows and contains the file name, including its full path.

Listing 1: Call the script

$ python arguments-programname.py
the script has the name arguments-programname.py  
$
$ python /home/user/arguments-programname.py
the script has the name /home/user/arguments-programname.py  
Example 2

In the second example we simply count the number of command line arguments using the built-in len() method. sys.argv is the list that we have to examine. In Example 2, a value of 1 is subtracted to get the right index (argument list counters start from zero). As you may remember from Example 1, the first element contains the name of the Python script, which we skip here.

Example 2: Count the arguments

import sys

# count the arguments
arguments = len(sys.argv) - 1  
print ("the script is called with %i arguments" % (arguments))  

Save and name this file arguments-count.py. The call is displayed in Listing 2. This includes three different scenarios: a) a call without any further command line arguments, b) a call with two arguments, and c) with two arguments where the second one is a quoted string (a string that contains a space).

Listing 2: Call the script

$ python arguments-count.py
the script is called with 0 arguments  
$
$ python arguments-count.py --help me
the script is called with 2 arguments  
$
$ python arguments-count.py --option "long string"
the script is called with 2 arguments  
Example 3

The third example outputs every single argument the Python script that is called with, except the program name itself. Therefore, we loop through the command line arguments starting with the second list element. As stated before, this element has index 1.

Example 3: Output arguments

import sys

# count the arguments
arguments = len(sys.argv) - 1

# output argument-wise
position = 1  
while (arguments >= position):  
    print ("parameter %i: %s" % (position, sys.argv[position]))
    position = position + 1

In Listing 3 the Python script is named arguments-output.py. As done in Listing 2, the output illustrates three different calls: a) without any arguments, b) with two arguments, and c) also with two arguments where the second argument is a quoted string that consists of two single words, separated by a space.

Listing 3: call the script

$ python arguments-output.py
$
$ python arguments-output.py --help me
parameter 1: --help  
parameter 2: me  
$
$ python arguments-output.py --option "long string"
parameter 1: --option  
parameter 2: long string  
$

The getopt Module

As you may have seen before the sys module splits the command line string into single fascets only. The Python getopt module goes a bit further, and extends the separation of the input string by parameter validation. Based on the getopt C function, it allows both short, and long options including a value assignment.

In practice it requires the sys module to process input data properly. To do so, both the sys module and the getopt module have to be loaded beforehand. Next, from the list of input parameters we remove the first list element (see Example 4.1), and store the remaining list of command line arguments in the variable called argumentList.

Example 4.1: Preparing the input parameters

# include standard modules
import getopt, sys

# read commandline arguments, first
fullCmdArguments = sys.argv

# - further arguments
argumentList = fullCmdArguments[1:]

print argumentList  

Now, argumentList can be parsed using the getopts() method. Before doing that, getopts() needs to know about the valid parameters. They are defined like this:

Example 4.2: Preparing the valid parameters

unixOptions = "ho:v"  
gnuOptions = ["help", "output=", "verbose"]  

This means that these arguments are seen as the valid ones, now:

------------------------------------------
long argument   short argument  with value  
------------------------------------------
--help           -h              no
--output         -o              yes
--verbose        -v              no
------------------------------------------

Next, this allows you to process the argument list. The getopt() method requires three parameters - the list of the remaining arguments, as well as both the valid UNIX, and GNU options (see table above).

The method call itself is kept in a try-catch-statement to cover errors during the evaluation. An exception is raised if an argument is discovered that is not part of the list as defined before (see Example 4.2). The Python script will print the error message to the screen, and exit with error code 2.

Example 4.3: Parsing the argument list

try:  
    arguments, values = getopt.getopt(argumentList, unixOptions, gnuOptions)
except getopt.error as err:  
    # output error, and return with an error code
    print (str(err))
    sys.exit(2)

Finally, the arguments with the corresponding values are stored in the two variables named arguments, and values. Now, you can evaluate these variables (see Example 4.4). The for-loop goes through the list of recognized arguments, one entry after the next.

Example 4.4: Specific actions according to the argument

# evaluate given options
for currentArgument, currentValue in arguments:  
    if currentArgument in ("-v", "--verbose"):
        print ("enabling verbose mode")
    elif currentArgument in ("-h", "--help"):
        print ("displaying help")
    elif currentArgument in ("-o", "--output"):
        print (("enabling special output mode (%s)") % (currentValue))

In Listing 4 you see the output of the program calls. These calls are displayed with both valid and invalid program arguments.

Listing 4: Testing the getopts() method

$ python arguments-getopt.py -h
displaying help  
$
$ python arguments-getopt.py --help
displaying help  
$
$ python arguments-getopt.py --output=green --help -v
enabling special output mode (green)  
displaying help  
enabling verbose mode  
$
$ python arguments-getopt.py -verbose
option -e not recognized  
$

The argparse Module

The argparse module is available starting in Python 3.2, and an enhancement of the optparse module that exists up to Python 2.7. The Python documentation contains an API description and a tutorial that covers all the methods in detail.

It offers a command line interface with a standardized output, whereas the former two solutions leave most of the work in your hands. argparse allows the verification of fixed and optional arguments with name checking as either UNIX or GNU style. As a default optional argument, it includes -h, along with its long version --help. This argument is accompanied by a default help message describing the argument.

Example 5 shows the parser initialization, and Listing 5 shows the basic call, followed by the help message. In contrast to the Python calls we used in the previous examples, keep in mind to use Python 3 with these examples.

Example 5: Basic usage of the argparse module

# include standard modules
import argparse

# initiate the parser
parser = argparse.ArgumentParser()  
parser.parse_args()  

Listing 5: Calling the basic argparse program

$ python3 arguments-argparse-basic.py 
$
$ python3 arguments-argparse-basic.py -h
usage: arguments-argparse-basic.py [-h]

optional arguments:  
  -h, --help  show this help message and exit
$
$ python3 arguments-argparse-basic.py --verbose
usage: arguments-argparse-basic.py [-h]  
arguments-argparse-basic.py: error: unrecognized arguments: --verbose  
$

As the next step, we will add a custom description to the help message. Initializing the parser allows an additional text. Example 6 stores the description in the text variable, which is explicitly given to the argparse class. In Listing 6 you see what the output looks like.

Example 6: Adding a description to the help message

# include standard modules
import argparse

# define the program description
text = 'This is a test program. It demonstrates how to use the argparse module with a program description.'

# initiate the parser with a description
parser = argparse.ArgumentParser(description = text)  
parser.parse_args()  

Listing 6: Calling the help with a program description

$ python3 arguments-argparse-description.py --help
usage: arguments-argparse-description.py [-h]

This is a test program. It demonstrates how to use the argparse module with a  
program description.

optional arguments:  
  -h, --help  show this help message and exit

As the final step we will add an optional argument named -V (UNIX style), which has a corresponding GNU style argument named --version. To do so we use the method add_argument() that we call with three parameters (displayed for --version, only):

The source code for that is displayed in Example 7. Reading the arguments into the variable called args is done via the parse_args() method from the parser object. Note that you submit both the short and the long version in one call. Finally, you check if the attributes args.V or args.version are set and output the version message.

Example 7: Defining an optional argument

# include standard modules
import argparse

# initiate the parser
parser = argparse.ArgumentParser()  
parser.add_argument("-V", "--version", help="show program version", action="store_true")

# read arguments from the command line
args = parser.parse_args()

# check for --version or -V
if args.version:  
    print("this is myprogram version 0.1")

Listing 7 displays the output if you call the program._

$ python3 arguments-argparse-optional.py -V
this is myprogram version 0.1  
$
$ python3 arguments-argparse-optional.py --version
this is myprogram version 0.1  

The --version argument does not require a value to be given on the command line. That's why we set the action argument to "store_true". In other cases you need an additional value, for example if you specify a certain volume, height, or width. This is shown in the next example. As a default case, please note that all the arguments are interpreted as strings.

Example 8: Defining an optional argument with value

# include standard modules
import argparse

# initiate the parser
parser = argparse.ArgumentParser()

# add long and short argument
parser.add_argument("--width", "-w", help="set output width")

# read arguments from the command line
args = parser.parse_args()

# check for --width
if args.width:  
    print("set output width to %s" % args.width)

Listing 8 shows different output values. This includes both the short and the long version as well as the help message.

Listing 8: Different output values

$ python3 arguments-argparse-optional2.py -w 10
set output width to 10  
$
$ python3 arguments-argparse-optional2.py --width 10
set output width to 10  
$
$ python3 arguments-argparse-optional2.py -h
usage: arguments-argparse-optional2.py [-h] [--width WIDTH]

optional arguments:  
  -h, --help            show this help message and exit
  --width WIDTH, -w WIDTH
                        set output width

Conclusion

In this article we showed many different methods of retrieving command line arguments in Python, including using sys, getopt, and argparse. These modules vary in functionality -- some providing much more than others. sys is fully flexible whereas both getoptand argparse require some structure. In contrast, they cover most of the complex work that sys leaves to your hands. After working through the examples provided, you should be able to determine which module suits your project best.

In this article we did not talk about the docopts module -- we just mentioned it. This module follows a totally different approach, and will be explained in detail in one of the next articles.

References

Acknowledgements

The author would like to thank Gerold Rupprecht for his support, and critics while preparing this article.

September 27, 2017 07:13 PM


PyCharm

PyCharm 2017.3 EAP 3 Out now!

The latest and greatest early access program (EAP) version of PyCharm is now available from our website:

Get PyCharm 2017.3 EAP 3

New in this version

__all__ warnings

Although Python doesn’t have support for private members, there are a couple of ways to indicate to developers that something shouldn’t be used. In addition to prefixing a method with an underscore, you can also explicitly specify public members with __all__. PyCharm can now warn you if you inadvertently use a member that’s excluded. Just be sure to enable the “Access to a protected member of a class or module” inspection in Settings | Editor | Inspections.

__all__ warning

Improved Completion for JOIN statements

Before, we would suggest the table names, now we’ll complete the entire JOIN statement. When you have your foreign keys properly set up, that is:

Improved completion for JOIN statements

Scripts for create-react-app

When creating a new react app, you can pass a scripts option to the utility to add support for additional features to your new application. This is now supported in PyCharm:

Specify Scripts Version for Create-React-App

Further Improvements

If these features sound interesting to you, try them yourself:

Get PyCharm 2017.3 EAP 3

As a reminder, PyCharm EAP versions:

If you run into any issues with this version, or another version of PyCharm, please let us know on our YouTrack. If you have other suggestions or remarks, you can reach us on Twitter, or by commenting on the blog.

September 27, 2017 04:48 PM


PyCon

PyCon 2018 Call for Proposals is Open!

It’s here! PyCon 2018’s Call for Proposals has officially opened for talks, tutorials, posters, and education summit presentations. PyCon is made by you, so we need you to share what you’re working on, how you’re working on it, what you’ve learned, what you’re learning, and so much more.

Before we dive in, the deadlines:

Who should write a proposal? Everyone!

If you’re reading this post, you should write a proposal. PyCon is about uniting and building the Python community, and we won’t advance as an open community if we’re not open with each other about what we’ve learned throughout our time in it. It isn’t about being the smartest one in the room, so we don’t just pick all of the expert talks. It’s about helping everyone move together. “A rising tide lifts all boats,” if you will.

We need beginner, intermediate, and advanced proposals on all sorts of topics. We also need beginner, intermediate, and advanced speakers to give said presentations. You don’t need to be a 20 year veteran who has spoken at dozens of conferences. On all fronts, we need all types of people. That’s what this community is comprised of, so that’s what this conference’s schedule should be made from.

When should you write your proposal? As soon as possible!

What we need now is for your submissions to start rolling in. We review proposals as soon as they’re entered, maximizing your time in front of the program committee and before they begin voting to determining the schedule. While we accept proposals right up to the deadline, the longer your proposal has been available for review, the better we can help you make it. That extra help goes a long way when you consider the large volume of proposals we anticipate receiving.

For PyCon 2017, we received 583 talk proposals, which makes for a 16% acceptance rate. The tutorial acceptance rate was at 27%, with 117 submissions.

Who can help you with your proposal? A lot of people!

Outside of our program committee, a great source of assistance with proposals comes from your local community. User groups around the world have had sessions where people bring ideas to the table and walk away with a full fledged proposal. These sessions  are especially helpful if you’re new to the process, and if you’re experienced with the process, it’s a great way for you to reach out and help people level up. We’ll be sure to share these events as we find out about them, and be sure to tell us your plans if you want to host a proposal event of your own!

We’re also trying something new for 2018 where we provide a mechanism to connect willing mentors and those seeking assistance through our site, helping not only with the brainstorming process but about the proposal, slides, and presentation itself. Read on to find out more and checkout out the “Mentoring” section of https://us.pycon.org/2018/speaking/talks/.

Where should you submit your proposal? In your dashboard!

After you’ve created an account at https://us.pycon.org/2018/account/signup/, you’ll want to create a speaker profile in your dashboard. While there, enter some details about yourself and check the various boxes about giving or receiving mentorship, as well as grant needs. Like proposals, you can come back and edit this later.

After that’s done, clicking on the “Submit a new proposal” button gives you the choice of proposal type, and from there you enter your proposal. We’ve provided some guidelines on the types of proposals you can submit, so please be sure to check out the following pages for more information:
We look forward to seeing all of your proposals in the coming months!

Written by Brian Curtin
Edited by Kaitlin Dershaw Durbin

September 27, 2017 01:22 PM


Doug Hellmann

os.path — Platform-independent Manipulation of Filenames — PyMOTW 3

Writing code to work with files on multiple platforms is easy using the functions included in the os.path module. Even programs not intended to be ported between platforms should use os.path for reliable filename parsing. Read more… This post is part of the Python Module of the Week series for Python 3. See PyMOTW.com for …

September 27, 2017 01:00 PM


Test and Code

31: I'm so sick of the testing pyramid

What started as a twitter disagreement carries over into this civil discussion of software testing.
Brian and Paul discuss testing practices such as the testing pyramid, TDD, unit testing, system testing, and balancing test effort.

Special Guest: Paul Merrill.

Sponsored By:

Links:

<p>What started as a twitter disagreement carries over into this civil discussion of software testing. <br> Brian and Paul discuss testing practices such as the testing pyramid, TDD, unit testing, system testing, and balancing test effort. </p> <ul> <li>the Testing Pyramid</li> <li>the Testing Column</li> <li>TDD</li> <li>unit testing</li> <li>balancing unit with system tests, functional tests</li> <li>API testing</li> <li>subcutaneous testing</li> <li>customer facing tests</li> </ul><p>Special Guest: Paul Merrill.</p><p>Sponsored By:</p><ul><li><a rel="nofollow" href="https://www.patreon.com/testpodcast">Test &amp; Code Patreon Supporters</a>: <a rel="nofollow" href="https://www.patreon.com/testpodcast">Thank you to Patreon Supporters</a></li><li><a rel="nofollow" href="https://nerdlettering.com">Nerdlettering</a>: <a rel="nofollow" href="https://nerdlettering.com">Love Python? Show It With Some Python Swag Custom-made Mugs and Accessories for Pythonistas, by Pythonistas.</a> Promo Code: TESTCODE</li></ul><p>Links:</p><ul><li><a title="Episode 34 - Software and Testing Models with Guest Host Brian Okken - Reflection As A Service" rel="nofollow" href="http://reflectionasaservice.com/2017/01/episode-34-software-testing-models-guest-host-brian-okken/">Episode 34 - Software and Testing Models with Guest Host Brian Okken - Reflection As A Service</a> &mdash; Cross posted to RaaS</li><li><a title="Subcutaneous Test" rel="nofollow" href="https://martinfowler.com/bliki/SubcutaneousTest.html">Subcutaneous Test</a> &mdash; I use subcutaneous test to mean a test that operates just under the UI of an application.</li><li><a title="The Forgotten Layer of the Test Automation Pyramid" rel="nofollow" href="https://www.mountaingoatsoftware.com/blog/the-forgotten-layer-of-the-test-automation-pyramid">The Forgotten Layer of the Test Automation Pyramid</a> &mdash; At the base of the test automation pyramid is unit testing.</li><li><a title="The Dreyfus model of skill acquisition" rel="nofollow" href="http://www.bumc.bu.edu/facdev-medicine/files/2012/03/Dreyfus-skill-level.pdf">The Dreyfus model of skill acquisition</a> &mdash; The Five-Stage Model of Adult Skill Acquisition</li></ul>

September 27, 2017 12:00 PM


Yasoob Khalid

Intermediate Python Book Anniversary

🙌 Hopefully this will be the last update regarding my book for a while. It has been 2 years since I self-published my “Intermediate Python” book 📖. In just a short span of 2 years I can not thank Allah enough for the level of success the book has achieved.

It has been translated into Chinese, Russian, Portuguese and Korean. A couple of days ago it hit the 🎉 🎉 520,000 readers mark 🎉 🎉 . Yes, you read that right! The book has been read by half a million people to date. This is just for the English version and I am pretty sure that Chinese version has been read by a lot more people.

The book has been opened in almost every part of the world at least once. The most rewarding experience is when you meet someone and they casually tell you how they came across your book online and actually learned something new from it.

Just wanted to thank each and every one of you who supported me. Do remember me in your prayers as I embark on new adventures. 🙂

I would also like to thank one very important person. This project wouldn’t have been possible without the inspiration and support I got from Daniel Roy Greenfeld. He has been a great mentor and I can always rely on his wisdom whenever in doubt.

Homepage: http://yasoob.me
Book link: http://book.pythontips.com


September 27, 2017 04:59 AM


Full Stack Python

Setting up PostgreSQL with Python 3 and psycopg on Ubuntu 16.04

PostgreSQL is a powerful open source relational database frequently used to create, read, update and delete Python web application data. Psycopg2 is a PostgreSQL database driver that serves as a Python client for access to the PostgreSQL server. This post explains how to install PostgreSQL on Ubuntu 16.04 and run a few basic SQL queries within a Python program.

We won't cover object-relational mappers (ORMs) in this tutorial but these steps can be used as a prerequisite to working with an ORM such as SQLAlchemy or Peewee.

Tools We Need

Our walkthrough should work with either Python 2 or 3 although all the steps were tested specifically with Python 3.5. Besides the Python interpreter, here are the other components we'll use:

If you aren't sure how to install pip and virtualenv, review the first few steps of the how to set up Python 3, Bottle and Green Unicorn on Ubuntu 16.04 LTS guide.

Install PostgreSQL

We'll install PostgreSQL via the apt package manager. There are a few packages we need since we want to both run PostgreSQL and use the psycopg2 driver with our Python programs. PostgreSQL will also be installed as a system service so we can start, stop and reload its configuration when necessary with the service command. Open the terminal and run:

sudo apt-get install postgresql libpq-dev postgresql-client postgresql-client-common

Enter your sudo password when prompted and enter 'yes' when apt asks if you want to install the new packages.

After a few moments apt will finish downloading, installing and processing.

We now have PostgreSQL installed and the PostgreSQL service is running in the background. However, we need to create a user and a database instance to really start using it. Use the sudo command to switch to the new "postgres" account.

sudo -i -u postgres

Within the "postgres" account, create a user from the command line with the createuser command. PostgreSQL will prompt you with several questions. Answer "n" to superuser and "y" to the other questions.

createuser matt -P --interactive

Awesome, now we have a PostgreSQL user that matches our Ubuntu login account. Exit out of the postgres account by pressing the "Ctrl" key along with "d" into the shell. We're back in our own user account.

Create a new database we can use for testing. You can name it "testpython" or whatever you want for your application.

createdb testpython

Now we can interact with "testpython" via the PostgreSQL command line tool.

Interacting with PostgreSQL

The psql command line client is useful for connecting directly to our PostgreSQL server without any Python code. Try out psql by using this command at the prompt:

psql

The PostgreSQL client will connect to the localhost server. The client is now ready for input:

Try out PostgreSQL's command prompt a try with commands such as \dt and \dd. We can also run SQL queries such as "SELECT * from testpython", although that won't give us back any data yet because we have not inserted any into the database. A full list of PostgreSQL commands can be found in the psql documentation.

Installing psycopg2

Now that PostgreSQL is installed and we have a non-superuser account, we can install the psycopg2 package. Let's figure out where our python3 executable is located, create a virtualenv with python3, activate the virtualenv and then install the psycopg2 package with pip. Find your python3 executable using the which command.

which python3

We will see output like what is in this screenshot.

Create a new virtualenv in either your home directory or wherever you store your Python virtualenvs. Specify the full path to your python3 installation.

# specify the system python3 installation
virtualenv --python=/usr/bin/python3 venvs/postgrestest

Activate the virtualenv.

source ~/venvs/postgrestest/bin/activate

Next we can install the psycopg2 Python package from PyPI using the pip command.

pip install psycopg2

Sweet, we've got our PostgreSQL driver installed in our virtualenv! We can now test out the installation by writing a few lines of Python code.

Using PostgreSQL from Python

Launch the Python REPL with the python or python3 command. You can also write the following code in a Python file such as "testpostgres.py" then execute it with python testpostgres.py. Make sure to replace the "user" and "password" values with your own.

import psycopg2

try:
    connect_str = "dbname='testpython' user='matt' host='localhost' " + \
                  "password='myOwnPassword'"
    # use our connection values to establish a connection
    conn = psycopg2.connect(connect_str)
    # create a psycopg2 cursor that can execute queries
    cursor = conn.cursor()
    # create a new table with a single column called "name"
    cursor.execute("""CREATE TABLE tutorials (name char(40));""")
    # run a SELECT statement - no data in there, but we can try it
    cursor.execute("""SELECT * from tutorials""")
    rows = cursor.fetchall()
    print(rows)
except Exception as e:
    print("Uh oh, can't connect. Invalid dbname, user or password?")
    print(e)

When we run the above code we won't get anything fancy, just an empty list printed out. However, in those few lines of code we've ensured our connection to our new database works and we can create new tables in it as well as query them.

That's just enough of a hook to get started writing more complicated SQL queries using psycopg2 and PostgreSQL. Make sure to check out the PostgreSQL, relational databases and object-relational mappers (ORMs) pages for more tutorials.

Questions? Tweet @fullstackpython or post a message on the Full Stack Python Facebook page.

See something wrong in this post? Fork this page's source on GitHub and submit a pull request.

September 27, 2017 04:00 AM


Brad Lucas

Ads Txt Files

Recently, my major focus has been AdTech. With this I came across the Ads.txt project. This project is a simple initiative proposed as a way to help publishers ensure inventory is sold only through authorized dealers and partners.

Publishers

Publishers create and add a file called ads.txt at the root of their sites. This text file contains a list of names and information about authorized ad networks, SSPs and exchanges that have permission to sell the publisher's inventory.

Buyers

Buyers can when purchasing inventory through and exchange can go to the publisher's domain and check that the exchange is authorized to sell inventory from the publisher by reading the publisher's ads.txt file.

Reference Crawler

The folks over at the IAB Tech Lab have released a reference implementation in Python of a simple crawler for Ads.txt files. You pass it a list of domains and it reads the ads.txt from each site and adds the contents of each to a local SQLite database.

Here is the repo.

As of this writing the master branch doesn't work. If you look at my fork of the project you'll see that the fix is to simply comment out the ADSYSTEM_DOMAIN column from the databse creation script. See the following for details.

For more reference here are a collection of links to review

September 27, 2017 04:00 AM


Jeff Hinrichs

MicroPython on ESP32: Tools – esptool.py

These are my notes on using some MicroPython specific tools in relation to a ESP32-DevKitC board.

These notes are for v2.1 of the esptool; an ESP8266 and ESP32 serial bootloader utility.

esptool has a number of functions, but I will only speak to those features required to identify the chip, get flash information and load the MicroPython firmware. See the docs for more information.

Installing esptool

(myproject) $ pip install esptool

Confirm install

(myproject) $ esptool.py version

Display chip information (chip_id)

(myproject) $ esptool.py -p /dev/ttyUSB0 chip_id
esptool.py v2.1
Connecting......
Detecting chip type... ESP32
Chip is ESP32D0WDQ6 (revision 1)
Uploading stub...
Running stub...
Stub running...
Chip ID: 0x7240ac40964
Hard resetting...

Display Flash memory information (flash_id)

(myproject) $ esptool.py -p /dev/ttyUSB0 flash_id
esptool.py v2.1
Connecting....
Detecting chip type... ESP32
Chip is ESP32D0WDQ6 (revision 1)
Uploading stub...
Running stub...
Stub running...
Manufacturer: c8
Device: 4016
Detected flash size: 4MB
Hard resetting...

Display MAC address of wifi adapter (read_mac)

(myproject) $ esptool.py -p /dev/ttyUSB0 read_mac
esptool.py v2.1
Connecting....
Detecting chip type... ESP32
Chip is ESP32D0WDQ6 (revision 1)
Uploading stub...
Running stub...
Stub running...
MAC: 24:0a:c4:09:64:c8
Hard resetting...

Loading MicroPython Firmware
You will need MicroPython firmware http://micropython.org/download#esp32
I download to a directory named images in my project folder. Since the ESP32 code is under development, I check out the GitHub commit page for the chip for any interesting new bits.

When loading to a board that does not already have MicroPython loaded, you should erase the entire flash before flashing the MicroPython firmware.

(myproject) $ esptool.py -p /dev/ttyUSB0 erase_flash
esptool.py v2.1
Connecting....
Detecting chip type... ESP32
Chip is ESP32D0WDQ6 (revision 1)
Uploading stub...
Running stub...
Stub running...
Erasing flash (this may take a while)...
Chip erase completed successfully in 5.0s
Hard resetting...

Now load the firmware with the write_flash command
The general form is:


esptool.py write_flash -p <port> -z <address> <filename>
  -p                    specify the port
  <port>                the port to use  i.e. /dev/ttyUSB0
  -z   			Compress data in transfer (default unless --no-stub is
                        specified)
  <address> <filename>  Address followed by binary filename, separated by
                        space
(myproject) $ esptool.py -p /dev/ttyUSB0 write_flash -z 0x1000 images/esp32-20170916-v1.9.2-272-g0d183d7f.bin
esptool.py v2.1
Connecting....
Detecting chip type... ESP32
Chip is ESP32D0WDQ6 (revision 1)
Uploading stub...
Running stub...
Stub running...
Configuring flash size...
Auto-detected Flash size: 4MB
Compressed 902704 bytes to 566927...
Wrote 902704 bytes (566927 compressed) at 0x00001000 in 50.0 seconds (effective 144.4 kbit/s)...
Hash of data verified.

Leaving...
Hard resetting...

Verify the firmware loaded correctly

(myproject) $ miniterm.py --raw /dev/ttyUSB0 115200
--- Miniterm on /dev/ttyUSB0  115200,8,N,1 ---
--- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---

>>>

Now do a hard reset using the reset button on the board

>>> ets Jun  8 2016 00:22:57

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
flash read err, 1000
ets_main.c 371 
ets Jun  8 2016 00:22:57

rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0010,len:4
load:0x3fff0014,len:4268
load:0x40078000,len:0
load:0x40078000,len:10648
entry 0x4007a56c
I (982) cpu_start: Pro cpu up.
I (983) cpu_start: Single core mode
I (984) heap_init: Initializing. RAM available for dynamic allocation:
I (994) heap_init: At 3FFAE2A0 len 00001D60 (7 KiB): DRAM
I (1013) heap_init: At 3FFD4158 len 0000BEA8 (47 KiB): DRAM
I (1032) heap_init: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM
I (1052) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (1072) heap_init: At 4008F3A8 len 00010C58 (67 KiB): IRAM
I (1091) cpu_start: Pro cpu start user code
I (1152) cpu_start: Starting scheduler on PRO CPU.
OSError: [Errno 2] ENOENT
MicroPython v1.9.2-272-g0d183d7f on 2017-09-16; ESP32 module with ESP32
Type "help()" for more information.
>>> 

You should verify that the firmware specified in the banner after the reset matches that firmware that you just loaded. In this case, v1.9.2-272-g0d183d7f

May the Zen of Python be with you.

September 27, 2017 03:08 AM

September 26, 2017


Python Data

Text Analytics with Python – A book review

Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your DataThis is a book review of Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data by Dipanjan Sarkar

One of my go-to books for natural language processing with Python has been Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper.  This has been the book for me and was one of my dissertation references.  I used this book so much, that I I had to buy a second copy of this book because I wore the first one out.  I’ve read many other NLP books but haven’t found any that could match this book – till now.

Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data by Dipanjan Sarkar is a fantastic book and has now taken a permanent place on my bookshelf.

Unlike many books that I run across, this book spends plenty of time talking about the theory behind things rather than just doing some hand-waving and then showing some code. In fact, there isn’t any code (that I saw) until page 41. That’s impressive these days.   Here’s a quick overview of the book’s layout:

If you have some familiarity with python and NLP, you can jump to Chapter 3 and dive into the details.

What I really like about this book is that it places theory first.  I’m a big fan of ‘learning by doing’ but I think before you can ‘do’ you need to know ‘why’ you are doing what you are doing.  The code in the book is really well done as well and uses the NLTK,  Sklearn and gensim libraries for most of the work. Additionally, there are multiple ‘build your own’ sections where the author provides a very good overview (and walk-through) of what it takes to build your own functionality for your own NLP work.

This book is highly recommended.


Links in this post:

Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper.

Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data by Dipanjan Sarkar

 

The post Text Analytics with Python – A book review appeared first on Python Data.

September 26, 2017 08:52 PM


Weekly Python Chat

Linting Code

Code linting: it's not about fuzzy stuff, it's about readability. Let's chat about this.

September 26, 2017 08:30 PM


Stack Abuse

Reading and Writing CSV Files in Python

What is a CSV File?

A CSV (Comma Separated Values) file is a file that uses a certain formatting for storing data. This file format organizes information, containing one record per line, with each field (column) separated by a delimiter. The delimiter most commonly used is usually a comma.

This format is so common that it has actually been standardized in the RFC 4180. However, this standard isn't always followed and there is a lack of universal standard usage. The exact format used can sometime depend on the application it's being used for.

CSV files are commonly used because they're easy to read and manage, they're small in size, and fast to process/transfer. Because of these benefits, they are frequently used in software applications, ranging anywhere from online e-commerce stores to mobile apps to desktop tools. For example, Magento, an e-commerce platform, is known for its support of CSV.

In addition, many applications, such as Microsoft Excel, Notepad, and Google Docs, can be used to import or export CSV files.

The csv Python Module

The csv module implements classes to operate with CSV files. It is focused on the format that is preferred by Microsoft Excel. However, its functionality is extensive enough to work with CSV files that use different delimiters and quoting characters.

This module provides the functions reader and writer, which work in a sequential manner. It also has the DictReader and DictWriter classes to manage your CSV data in the form of a Python dictionary object.

csv.reader

The csv.reader(csvfile, dialect='excel', **fmtparams) method can be used to extract data from a file that contains CSV-formatted data.

It takes the following parameters:

This method returns a reader object, which can be iterated over to retrieve the lines of your CSV. The data is read as a list of strings. If we specify the QUOTE_NONNUMERIC format, non-quoted values are converted into float values.

An example on how to use this method is given in the Reading CSV Files section of this article.

csv.writer

The csv.writer(csvfile, dialect='excel', **fmtparams) method, which is similar to the reader method we described above, is a method that permits us to write data to a file in CSV format.

This method takes the following parameters:

A note of caution with this method: If the csvfile parameter specified is a file object, it needs to have been opened it with newline=''. If this is not specified, newlines inside quoted fields will not be interpreted correctly, and depending on the working platform, extra characters, such as '\r' may be added.

csv.DictReader and csv.DictWriter

The csv module also provides us the DictReader and DictWriter classes, which allow us to read and write to files using dictionary objects.

The class DictReader() works in a similar manner as a csv.reader, but in Python 2 it maps the data to a dictionary and in Python 3 it maps data to an OrderedDict. The keys are given by the field-names parameter.

And just like DictReader, the class DictWriter() works very similarly to the csv.writer method, although it maps the dictionary to output rows. However, be aware that since Python's dictionaries are not ordered, we cannot predict the row order in the output file.

Both of these classes includes an optional parameter to use dialects.

Dialects

A dialect, in the context of reading and writing CSVs, is a construct that allows you to create, store, and re-use various formatting parameters for your data.

Python offers two different ways to specify formatting parameters. The first is by declaring a subclass of this class, which contains the specific attributes. The second is by directly specifying the formatting parameters, using the same names as defined in the Dialect class.

Dialect supports several attributes. The most frequently used are:

Use this class to tell the csv module how to interact with your non-standard CSV data.

Versions

One important thing to note if you're using Python 2.7: it isn't as easy to support Unicode input in this version of Python, so you may need to ensure all of your input is in UTF-8 or printable ASCII characters.

A CSV File Example

We can create a CSV file easily with a text editor or even Excel. In the example below, the Excel file has a combination of numbers (1, 2 and 3) and words (Good morning, Good afternoon, Good evening), each of them in a different cell.

To save this file as a CSV, click File->Save As, then in the Save As window, select "Comma Separated Values (.csv)" under the Format dropdown. Save it as csvexample.csv for later use.

The structure of the CSV file can be seen using a text editor, such as Notepad or Sublime Text. Here, we can get the same values as in the Excel file, but separated by commas.

1,2,3  
Good morning,Good afternoon,Good evening  

We will use this file in the following examples.

We can also change the delimiter to something other than a comma, like a forward slash ('/'). Make this change in the file above, replacing all of the commas with forward slashes, and save it as csvexample2.csv for later use. It will look as follows:

1/2/3  
Good morning/Good afternoon/Good evening  

This is also valid CSV data, as long as we use the correct dialect and formatting to read/write the data, which in this case would require a '/' delimiter.

Reading CSV Files

A Simple CSV File

In this example we are going to show how you can read the csvexample.csv file, which we created and explained in a previous section. The code is as follows:

import csv

with open('csvexample.csv', newline='') as myFile:  
    reader = csv.reader(myFile)
    for row in reader:
        print(row)

In this code we open our CSV file as myFile and then use the csv.reader method to extract the data in to the reader object, which we can then iterate over to retrieve each line of our data. For this example, to show that the data was actually read, we just print it to the console.

If we save the code in a file named reader.py and we run it, the result should show the following:

$ python reader.py
['1', '2', '3']
['Good morning', 'Good afternoon', 'Good evening']

As we can see from running this code, we obtain the contents of the csvexample.csv file, which are printed to the console, except that now it is in a structured form that we can more easily work with in our code.

Changing the Delimiter

The csv module allows us to read CSV files, even when some of the file format characteristics are different from the standard formatting. For example, we can read a file with a different delimiter, like tabs, periods, or even spaces (any character, really). In our other example, csvexample2.csv, we have replaced the comma with a forward slash to demonstrate this.

In order to perform the same task as above with this new formatting, we must modify the code to indicate the new delimiter being used. In this example, we have saved the code in a file named reader2.py. The modified program is a follows:

import csv

with open('csvexample2.csv', newline='') as myFile:  
    reader = csv.reader(myFile, delimiter='/', quoting=csv.QUOTE_NONE)
    for row in reader:
        print(row)

As we can see from the code above, we have modified the third line of code by adding the delimiter parameter and assigning a value of '/' to it. This tells the method to treat all '/' characters as the separating point between column data.

We have also added the quoting parameter, and assigned it a value of csv.QUOTE_NONE, which means that the method should not use any special quoting while parsing. As expected, the result is similar to the previous example:

$ python reader2.py
['1', '2', '3']
['Good morning', 'Good afternoon', 'Good evening']

As you can see, thanks to the small changes in the code we still get the same expected result.

Creating a Dialect

The csv module allows us to create a dialect with the specific characteristics of our CSV file. Thus, the same result from above can also be achieved with the following code:

import csv

csv.register_dialect('myDialect', delimiter='/', quoting=csv.QUOTE_NONE)

with open('csvexample2.csv', newline='') as myFile:  
   reader = csv.reader(myFile, dialect='myDialect')
   for row in reader:
       print(row) 

Here we create and register our own named dialect, which in this case uses the same formatting parameters as before (forward slashes and no quoting). We then specify to csv.reader that we want to use the dialect we registered by passing its name as the dialect parameter.

If we save this code in a file named reader3.py and run it, the result will be as follows:

$ python reader3.py
['1', '2', '3']
['Good morning', 'Good afternoon', 'Good evening']

Again, this output is exactly the same as above, which means we correctly parsed the non-standard CSV data.

Writing to CSV Files

Just like reading CSVs, the csv module appropriately provides plenty of functionality to write data to a CSV file as well. The writer object presents two functions, namely writerow() and writerows(). The difference between them, as you can probably tell from the names, is that the first function will only write one row, and the function writerows() writes several rows at once.

The code in the example below creates a list of data, with each element in the outer list representing a row in the CSV file. Then, our code opens a CSV file named csvexample3.csv, creates a writer object, and writes our data to the file using the writerows() method.

import csv

myData = [[1, 2, 3], ['Good Morning', 'Good Evening', 'Good Afternoon']]  
myFile = open('csvexample3.csv', 'w')  
with myFile:  
   writer = csv.writer(myFile)
   writer.writerows(myData)

The resulting file, csvexample3.csv, should have the following text:

1,2,3  
Good Morning,Good Evening,Good Afternoon  

The writer object also caters to other CSV formats as well. The following example creates and uses a dialect with '/' as delimiter:

import csv

myData = [[1, 2, 3], ['Good Morning', 'Good Evening', 'Good Afternoon']]

csv.register_dialect('myDialect', delimiter='/', quoting=csv.QUOTE_NONE)

myFile = open('csvexample4.csv', 'w')  
with myFile:  
   writer = csv.writer(myFile, dialect='myDialect')
   writer.writerows(myData)

Similar to our "reading" example, we create a dialect in the same way (via csv.register_dialect()) and use it in the same way, by specifying it by name.

And again, running the code above results in the following output to our new csvexample4.csv file:

1/2/3  
Good Morning/Good Evening/Good Afternoon  

Using Dictionaries

In many cases, our data won't be formatted as a 2D array (as we saw in the previous examples), and it would be nice if we had better control over the data we read. To help with this problem, the csv module provides helper classes that lets us read/write our CSV data to/from dictionary objects, which makes the data much easier to work with.

Interacting with your data in this way is much more natural for most Python applications and will be easier to integrate in to your code thanks to the familiarity of dict.

Reading a CSV File with DictReader

Using your favorite text editor, create a CSV file named countries.csv with the following content:

country,capital  
France,Paris  
Italy,Rome  
Spain,Madrid  
Russia,Moscow  

Now, the format of this data might look a little bit different than our examples before. The first row in this file contains the field/column names, which provides a label for each column of data. The rows in this file contain pairs of values (country, capital) separated by a comma. These labels are optional, but tend to be very helpful, especially when you have to actually look this data yourself.

In order to read this file, we create the following code:

import csv  

with open('countries.csv') as myFile:  
    reader = csv.DictReader(myFile)
    for row in reader:
        print(row['country'])

We still loop through each row of the data, but notice how we can now access each row's columns by their label, which in this case is the country. If we wanted, we could also access the capital with row['capital'].

Running the code results in the following:

$ python readerDict.py
France  
Italy  
Spain  
Russia  

Writing to a File with DictWriter

We can also create a CSV file using our dictionaries. In the code below, we create a dictionary with the country and capital fields. Then we create a writer object that writes data to our countries.csv file, which has the set of fields previously defined with the list myFields.

Following that, we first write the header row with the writeheader() method, and then the pairs of values using the writerow() method. Each value's position in the row is specified using the column label. You can probably imagine how useful this becomes when you have tens or even hundreds of columns in your CSV data.

import csv

myFile = open('countries.csv', 'w')  
with myFile:  
    myFields = ['country', 'capital']
    writer = csv.DictWriter(myFile, fieldnames=myFields)    
    writer.writeheader()
    writer.writerow({'country' : 'France', 'capital': 'Paris'})
    writer.writerow({'country' : 'Italy', 'capital': 'Rome'})
    writer.writerow({'country' : 'Spain', 'capital': 'Madrid'})
    writer.writerow({'country' : 'Russia', 'capital': 'Moscow'})

And finally, running this code gives us the correct CSV output, with labels and all:

country,capital  
France,Paris  
Italy,Rome  
Spain,Madrid  
Russia,Moscow  

Conclusion

CSV files are a handy file storage format that many developers use in their projects. They're are small, easy to manage, and widely used throughout software development. Lucky for you, Python has a dedicated module for them that provides flexible methods and classes for managing CSV files in a straightforward and efficient manner.

In this article we showed you how to use the csv Python module to both read and write CSV data to a file. In addition to this, we also showed how to create dialects, and use helper classes like DictReader and DictWriter to read and write CSVs from/to dict objects.

September 26, 2017 07:00 PM


Continuum Analytics Blog

Anaconda and Microsoft Partner to Deliver Python-Powered Machine Learning

Strata Data Conference, NEW YORK––September 26, 2017––Anaconda, Inc., the most popular Python data science platform provider, today announced it is partnering with Microsoft to embed Anaconda into Azure Machine Learning, Visual Studio and SQL Server to deliver data insights in real time.

September 26, 2017 01:00 PM


hypothesis.works articles

When multiple bugs attack

When Hypothesis finds an example triggering a bug, it tries to shrink the example down to something simpler that triggers it. This is a pretty common feature, and most property-based testing libraries implement something similar (though there are a number of differences between them). Stand-alone test case reducers are also fairly common, as it’s a useful thing to be able to do when reporting bugs in external projects - rather than submitting a giant file triggering the bug, a good test case reducer can often shrink it down to a couple of lines.

But there’s a problem with doing this: How do you know that the bug you started with is the same as the bug you ended up with?

This isn’t just an academic question. It’s very common for the bug you started with to slip to another one.

Consider for example, the following test:

from hypothesis import given, strategies as st

def mean(ls):
    return sum(ls) / len(ls)


@given(st.lists(st.floats()))
def test(ls):
    assert min(ls) <= mean(ls) <= max(ls)

This has a number of interesting ways to fail: We could pass NaN, we could pass [-float('inf'), +float('inf')], we could pass numbers which trigger a precision error, etc.

But after test case reduction, we’ll pass the empty list and it will fail because we tried to take the min of an empty sequence.

This isn’t necessarily a huge problem - we’re still finding a bug after all (though in this case as much in the test as in the code under test) - and sometimes it’s even desirable - you find more bugs this way, and sometimes they’re ones that Hypothesis would have missed - but often it’s not, and an interesting and rare bug slips to a boring and common one.

Historically Hypothesis has had a better answer to this than most - because of the Hypothesis example database, all intermediate bugs are saved and a selection of them will be replayed when you rerun the test. So if you fix one bug then rerun the test, you’ll find the other bugs that were previously being hidden from you by that simpler bug.

But that’s still not a great user experience - it means that you’re not getting nearly as much information as you could be, and you’re fixing bugs in Hypothesis’s priority order rather than yours. Wouldn’t it be better if Hypothesis just told you about all of the bugs it found and you could prioritise them yourself?

Well, as of Hypothesis 3.29.0, released a few weeks ago, now it does!

If you run the above test now, you’ll get the following:

Falsifying example: test(ls=[nan])
Traceback (most recent call last):
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 671, in run
    print_example=True, is_final=True
  File "/home/david/hypothesis-python/src/hypothesis/executors.py", line 58, in default_new_style_executor
    return function(data)
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 120, in run
    return test(*args, **kwargs)
  File "broken.py", line 8, in test
    def test(ls):
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 531, in timed_test
    result = test(*args, **kwargs)
  File "broken.py", line 9, in test
    assert min(ls) <= mean(ls) <= max(ls)
AssertionError

Falsifying example: test(ls=[])
Traceback (most recent call last):
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 671, in run
    print_example=True, is_final=True
  File "/home/david/hypothesis-python/src/hypothesis/executors.py", line 58, in default_new_style_executor
    return function(data)
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 120, in run
    return test(*args, **kwargs)
  File "broken.py", line 8, in test
    def test(ls):
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 531, in timed_test
    result = test(*args, **kwargs)
  File "broken.py", line 9, in test
    assert min(ls) <= mean(ls) <= max(ls)
ValueError: min() arg is an empty sequence

You can add @seed(67388524433957857561882369659879357765) to this test to reproduce this failure.
Traceback (most recent call last):
  File "broken.py", line 12, in <module>
    test()
  File "broken.py", line 8, in test
    def test(ls):
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 815, in wrapped_test
    state.run()
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 732, in run
    len(self.falsifying_examples,)))
hypothesis.errors.MultipleFailures: Hypothesis found 2 distinct failures.

(The stack traces are a bit noisy, I know. We have an issue open about cleaning them up).

All of the different bugs are minimized simultaneously and take full advantage of Hypothesis’s example shrinking, so each bug is as easy (or hard) to read as if it were the only bug we’d found.

This isn’t perfect: The heuristic we use for determining if two bugs are the same is whether they have the same exception type and the exception is thrown from the same line. This will necessarily conflate some bugs that are actually different - for example, [float('nan')], [-float('inf'), float('inf')] and [3002399751580415.0, 3002399751580415.0, 3002399751580415.0] each trigger the assertion in the test, but they are arguably “different” bugs.

But that’s OK. The heuristic is deliberately conservative - the point is not that it can distinguish whether any two examples are the same bug, just that any two examples it distinguishes are different enough that it’s interesting to show both, and this heuristic definitely manages that.

As far as I know this is a first in property-based testing libraries (though something like it is common in fuzzing tools, and theft is hot on our tail with something similar) and there’s been some interesting related but mostly orthogonal research in Erlang QuickCheck.

It was also surprisingly easy.

A lot of things went right in writing this feature, some of them technical, some of them social, somewhere in between.

The technical ones are fairly straightforward: Hypothesis’s core model turned out to be very well suited to this feature. Because Hypothesis has a single unified intermediate representation which defines a total ordering for simplicity, adapting Hypothesis to shrink multiple things at once was quite easy - whenever we attempt a shrink and it produces a different bug than the one we were looking for, we compare it to our existing best example for that bug and replace it if the current one is better (or we’ve discovered a new bug). We then just repeatedly run the shrinking process for each bug we know about until they’ve all been fully shrunk.

This is in a sense not surprising - I’ve been thinking about the problem of multiple-shrinking for a long time and, while this is the first time it’s actually appeared in Hypothesis, the current choice of model was very much informed by it.

The social ones are perhaps more interesting. Certainly I’m very pleased with how they turned out here.

The first is that this work emerged tangentially from the recent Stripe funded work - Stripe paid me to develop some initial support for testing Pandas code with Hypothesis, and I observed a bunch of bug slippage happening in the wild while I was testing that (it turns out there are quite a lot of ways to trigger exceptions from Pandas - they weren’t really Pandas bugs so much as bugs in the Pandas integration, but they still slipped between several different exception types), so that was what got me thinking about this problem again.

Not by accident, this feature also greatly simplified the implementation of the new deadline feature that Smarkets funded, which was going to have to have a lot of logic about how deadlines and bugs interacted, but all that went away as soon as we were able to handle multiple bugs sensibly.

This has been a relatively consistent theme in Hypothesis development - practical problems tend to spark related interesting theoretical developments. It’s not a huge exaggeration to say that the fundamental Hypothesis model exists because I wanted to support testing Django nicely. So the recent funded development from Stripe and Smarkets has been a great way to spark a lot of seemingly unrelated development and improve Hypothesis for everyone, even outside the scope of the funded work.

Another thing that really helped here is our review process, and the review from Zac in particular.

This wasn’t the feature I originally set out to develop. It started out life as a much simpler feature that used much of the same machinery, and just had a goal of avoiding slipping to new errors all together. Zac pushed back with some good questions around whether this was really the correct thing to do, and after some experimentation and feedback I eventually hit on the design that lead to displaying all of the errors.

Our review handbook emphasises that code review is a collaborative design process, and I feel this was a particularly good example of that. We’ve created a great culture of code review, and we’re reaping the benefits (and if you want to get in on it, we could always use more people able and willing to do review…).

All told, I’m really pleased with how this turned out. I think it’s a nice example of getting a lot of things right up front and this resulting in a really cool new feature.

I’m looking forward to seeing how it behaves in the wild. If you notice any particularly fun examples, do let me know, or write up a post about them yourself!

Read more...

September 26, 2017 12:00 PM


Python Software Foundation

Join the Python Developers Survey 2017: Share and learn about the community

2017 is drawing to a close and we are super-excited to start the official Python Developers Survey 2017!

We’ve created this survey specially for Python developers who use it as their primary or supplementary language. We expect the survey findings to help us map an accurate landscape of the Python developer community and to provide insight into the current major trends in the Python community.

Please take a few minutes to complete the Python Developers Survey 2017!

Your valuable opinion and feedback will help us better understand how different Python developers use Python and related frameworks, tools and technologies. We also hope you'll have fun going through the questions.

The survey is organized in partnership between the Python Software Foundation and JetBrains. After the survey is over, we will publish the aggregated results and randomly choose 100 winners (from those who complete the survey in its entirety), who will each receive an amazing Python Surprise Gift Pack.

September 26, 2017 11:01 AM


Simple is Better Than Complex

How to Create Django Data Migrations

Data Migration is a very convenient way to change the data in the database in conjunction with changes in the schema. They work like a regular schema migration. Django keep track of dependencies, order of execution and if the application already applied a given data migration or not.

A common use case of data migrations is when we need to introduce new fields that are non-nullable. Or when we are creating a new field to store a cached count of something, so we can create the new field and add the initial count.

In this post we are going to explore a simple example that you can very easily extend and modify for your needs.


Data Migrations

Let’s suppose we have an app named blog, which is installed in our project’s INSTALLED_APPS.

The blog have the following model definition:

blog/models.py

from django.db import models

class Post(models.Model):
    title = models.CharField(max_length=255)
    date = models.DateTimeField(auto_now_add=True)
    content = models.TextField()

    def __str__(self):
        return self.title

The application is already using this Post model; it’s already in production and there are plenty of data stored in the database.

id title date content
1 How to Render Django Form Manually 2017-09-26 11:01:20.547000 […]
2 How to Use Celery and RabbitMQ with Django 2017-09-26 11:01:39.251000 […]
3 How to Setup Amazon S3 in a Django Project 2017-09-26 11:01:49.669000 […]
4 How to Configure Mailgun To Send Emails in a Django Project 2017-09-26 11:02:00.131000 […]

Now let’s say we want to introduce a new field named slug which will be used to compose the new URLs of the blog. The slug field must be unique and not null.

Generally speaking, always add new fields either as null=True or with a default value. If we can’t solve the problem with the default parameter, first create the field as null=True then create a data migration for it. After that we can then create a new migration to set the field as null=False.

Here is how we can do it:

blog/models.py

from django.db import models

class Post(models.Model):
    title = models.CharField(max_length=255)
    date = models.DateTimeField(auto_now_add=True)
    content = models.TextField()
    slug = models.SlugField(null=True)

    def __str__(self):
        return self.title

Create the migration:

python manage.py makemigrations blog

Migrations for 'blog':
  blog/migrations/0002_post_slug.py
    - Add field slug to post

Apply it:

python manage.py migrate blog

Operations to perform:
  Apply all migrations: blog
Running migrations:
  Applying blog.0002_post_slug... OK

At this point, the database already have the slug column.

id title date content slug
1 How to Render Django Form Manually 2017-09-26 11:01:20.547000 […] (null)
2 How to Use Celery and RabbitMQ with Django 2017-09-26 11:01:39.251000 […] (null)
3 How to Setup Amazon S3 in a Django Project 2017-09-26 11:01:49.669000 […] (null)
4 How to Configure Mailgun To Send Emails in a Django Project 2017-09-26 11:02:00.131000 […] (null)

Create an empty migration with the following command:

python manage.py makemigrations blog --empty

Migrations for 'blog':
  blog/migrations/0003_auto_20170926_1105.py

Now open the file 0003_auto_20170926_1105.py, and it should have the following contents:

blog/migrations/0003_auto_20170926_1105.py

# -*- coding: utf-8 -*-
# Generated by Django 1.11.5 on 2017-09-26 11:05
from __future__ import unicode_literals

from django.db import migrations


class Migration(migrations.Migration):

    dependencies = [
        ('blog', '0002_post_slug'),
    ]

    operations = [
    ]

Then here in this file, we can create a function that can be executed by the RunPython command:

blog/migrations/0003_auto_20170926_1105.py

# -*- coding: utf-8 -*-
# Generated by Django 1.11.5 on 2017-09-26 11:05
from __future__ import unicode_literals

from django.db import migrations
from django.utils.text import slugify


def slugify_title(apps, schema_editor):
    '''
    We can't import the Post model directly as it may be a newer
    version than this migration expects. We use the historical version.
    '''
    Post = apps.get_model('blog', 'Post')
    for post in Post.objects.all():
        post.slug = slugify(post.title)
        post.save()


class Migration(migrations.Migration):

    dependencies = [
        ('blog', '0002_post_slug'),
    ]

    operations = [
        migrations.RunPython(slugify_title),
    ]

In the example above we are using the slugify utility function. It takes a string as parameter and transform it in a slug. See below some examples:

from django.utils.text import slugify

slugify('Hello, World!')
'hello-world'

slugify('How to Extend the Django User Model')
'how-to-extend-the-django-user-model'

Anyway, the function used by the RunPython method to create a data migration, expects two parameters: apps and schema_editor. The RunPython will feed those parameters. Also remember to import models using the apps.get_model('app_name', 'model_name') method.

Save the file and execute the migration as you would do with a regular model migration:

python manage.py migrate blog
Operations to perform:
  Apply all migrations: blog
Running migrations:
  Applying blog.0003_auto_20170926_1105... OK

Now if we check the database:

id title date content slug
1 How to Render Django Form Manually 2017-09-26 11:01:20.547000 […] how-to-render-django-form-manually
2 How to Use Celery and RabbitMQ with Django 2017-09-26 11:01:39.251000 […] how-to-use-celery-and-rabbitmq-with-django
3 How to Setup Amazon S3 in a Django Project 2017-09-26 11:01:49.669000 […] how-to-setup-amazon-s3-in-a-django-project
4 How to Configure Mailgun To Send Emails in a Django Project 2017-09-26 11:02:00.131000 […] how-to-configure-mailgun-to-send-emails-in-a-django-project

Every Post entry have a value, so we can safely change the switch from null=True to null=False. And since all the values are unique, we can also add the unique=True flag.

Change the model:

blog/models.py

from django.db import models

class Post(models.Model):
    title = models.CharField(max_length=255)
    date = models.DateTimeField(auto_now_add=True)
    content = models.TextField()
    slug = models.SlugField(null=False, unique=True)

    def __str__(self):
        return self.title

Create a new migration:

python manage.py makemigrations blog

This time you will see the following prompt:

You are trying to change the nullable field 'slug' on post to non-nullable without a default; we can't do that
(the database needs something to populate existing rows).
Please select a fix:
 1) Provide a one-off default now (will be set on all existing rows with a null value for this column)
 2) Ignore for now, and let me handle existing rows with NULL myself (e.g. because you added a RunPython or RunSQL
 operation to handle NULL values in a previous data migration)
 3) Quit, and let me add a default in models.py
Select an option:

Select option 2 by typing “2” in the terminal.

Migrations for 'blog':
  blog/migrations/0004_auto_20170926_1422.py
    - Alter field slug on post

Now we can safely apply the migration:

python manage.py migrate blog
Operations to perform:
  Apply all migrations: blog
Running migrations:
  Applying blog.0004_auto_20170926_1422... OK

Conclusions

Data migrations are tricky sometimes. When creating data migration for your projects, always examine the production data first. The implementation of the slugify_title I used in the example is a little naïve, because it could generate duplicate titles for a large dataset. Always test the data migrations first in a staging environment, so to avoid breaking things in production.

It’s also important to do it step-by-step, so you can feel in control of the changes you are introducing. Note that here I create three migration files for a simple data migration.

As you can see, it’s fairly easy to create this type of migration. It’s also very flexible. You could for example load an external text file to insert the data into a new column for example.

The source code used in this blog post is available on GitHub: https://github.com/sibtc/data-migrations-example

September 26, 2017 09:00 AM


S. Lott

Learning About Data Science.

I work with data scientists. I am not a scientist.


This kind of thing on scikit learn is helpful for understanding what they're trying to do and how I can help.

September 26, 2017 08:00 AM


eGenix.com

Python Meeting Düsseldorf - 2017-09-27

The following text is in German, since we're announcing a regional user group meeting in Düsseldorf, Germany.

Ankündigung

Das nächste Python Meeting Düsseldorf findet an folgendem Termin statt:

27.09.2017, 18:00 Uhr
Raum 1, 2.OG im Bürgerhaus Stadtteilzentrum Bilk
Düsseldorfer Arcaden, Bachstr. 145, 40217 Düsseldorf


Neuigkeiten

Bereits angemeldete Vorträge

Dr. Uwe Ziegenhagen
        "Datenanalyse mit Python pandas"

Charlie Clark
        "Typ-Systeme in Python"

Weitere Vorträge können gerne noch angemeldet werden. Bei Interesse, bitte unter info@pyddf.de melden.

Startzeit und Ort

Wir treffen uns um 18:00 Uhr im Bürgerhaus in den Düsseldorfer Arcaden.

Das Bürgerhaus teilt sich den Eingang mit dem Schwimmbad und befindet sich an der Seite der Tiefgarageneinfahrt der Düsseldorfer Arcaden.

Über dem Eingang steht ein großes "Schwimm’ in Bilk" Logo. Hinter der Tür direkt links zu den zwei Aufzügen, dann in den 2. Stock hochfahren. Der Eingang zum Raum 1 liegt direkt links, wenn man aus dem Aufzug kommt.

>>> Eingang in Google Street View

Einleitung

Das Python Meeting Düsseldorf ist eine regelmäßige Veranstaltung in Düsseldorf, die sich an Python Begeisterte aus der Region wendet.

Einen guten Überblick über die Vorträge bietet unser PyDDF YouTube-Kanal, auf dem wir Videos der Vorträge nach den Meetings veröffentlichen.

Veranstaltet wird das Meeting von der eGenix.com GmbH, Langenfeld, in Zusammenarbeit mit Clark Consulting & Research, Düsseldorf:

Programm

Das Python Meeting Düsseldorf nutzt eine Mischung aus (Lightning) Talks und offener Diskussion.

Vorträge können vorher angemeldet werden, oder auch spontan während des Treffens eingebracht werden. Ein Beamer mit XGA Auflösung steht zur Verfügung.

(Lightning) Talk Anmeldung bitte formlos per EMail an info@pyddf.de

Kostenbeteiligung

Das Python Meeting Düsseldorf wird von Python Nutzern für Python Nutzer veranstaltet.

Da Tagungsraum, Beamer, Internet und Getränke Kosten produzieren, bitten wir die Teilnehmer um einen Beitrag in Höhe von EUR 10,00 inkl. 19% Mwst. Schüler und Studenten zahlen EUR 5,00 inkl. 19% Mwst.

Wir möchten alle Teilnehmer bitten, den Betrag in bar mitzubringen.

Anmeldung

Da wir nur für ca. 20 Personen Sitzplätze haben, möchten wir bitten, sich per EMail anzumelden. Damit wird keine Verpflichtung eingegangen. Es erleichtert uns allerdings die Planung.

Meeting Anmeldung bitte formlos per EMail an info@pyddf.de

Weitere Informationen

Weitere Informationen finden Sie auf der Webseite des Meetings:

              http://pyddf.de/

Viel Spaß !

Marc-Andre Lemburg, eGenix.com

September 26, 2017 08:00 AM