Planet Python
Last update: September 29, 2017 01:47 PM
September 28, 2017
Tomasz Früboes
Flask and bower – download jQuery, others automatically
In last couple of weeks, I experimented a lot with flask and web development. Since this a learning phase, I tend to setup lots of small projects, where a couple of external libraries (such as bootstrap or jQuery) usually need to be downloaded. As I am rather lazy, using an external tool (bower) to download… Continue reading
Stack Abuse
Read a File Line-by-Line in Python
Introduction
Over the course of my working life I have had the opportunity to use many programming concepts and technologies to do countless things. Some of these things involve relatively low value fruits of my labor, such as automating the error prone or mundane like report generation, task automation, and general data reformatting. Others have been much more valuable, such as developing data products, web applications, and data analysis and processing pipelines. One thing that is notable about nearly all of these projects is the need to simply open a file, parse its contents, and do something with it.
However, what do you do when the file you are trying to consume is quite large? Say the file is several GB of data or larger? Again, this has been another frequent aspect of my programming career, which has primarily been spent in the BioTech sector where files, up to a TB in size, are quite common.
The answer to this problem is to read in chunks of a file at a time, process it, then free it from memory so you can pull in and process another chunk until the whole massive file has been processed. While it is up to the programmer to determine a suitable chunk size, perhaps the most commonly used is simply a line of a file at a time.
This is what we will be discussing in this article - memory management by reading in a file line-by-line in Python. The code used in this article can be found in the following GitHub repo.
Basic File IO in Python
Being a great general purpose programming language, Python has a number of very useful file IO functionality in its standard library of built-in functions and modules. The built-in open() function is what you use to open a file object for either reading or writing purposes.
fp = open('path/to/file.txt', 'r')
The open() function takes in multiple arguments. We will be focusing on the first two, with the first being a positional string parameter representing the path to the file that should be opened. The second optional parameter is also a string which specifies the mode of interaction you intend for the file object being returned by the function call. The most common modes are listed in the table below, with the default being 'r' for reading.
| Mode | Description |
|---|---|
| `r` | Open for reading plain text |
| `w` | Open for writing plain text |
| `a` | Open an existing file for appending plain text |
| `rb` | Open for reading binary data |
| `wb` | Open for writing binary data |
Once you have written or read all of the desired data for a file object you need to close the file so that resources can be reallocated on the operating system that the code is running on.
fp.close()
You will often see many code snippets on the internet or in programs in the wild that do not explicitly close file objects that have been generated in accord with the example above. It is always good practice to close a file object resource, but many of us either are too lazy or forgetful to do so or think we are smart because documentation suggests that an open file object will self close once a process terminates. This is not always the case.
Instead of harping on how important it is to always call close() on a file object, I would like to provide an alternate and more elegant way to open a file object and ensure that the Python interpreter cleans up after us :)
with open('path/to/file.txt') as fp:
# do stuff with fp
By simply using the with keyword (introduced in python 2.5) to wrap our code for opening a file object, the internals of Python will do something similar to the following code to ensure that no matter what the file object is closed after use.
try:
fp = open('path/to/file.txt')
# do stuff here
finally:
fp.close()
Reading Line by Line
Now, lets get to actually reading in a file. The file object returned from open() has three common explicit methods (read, readline, and readlines) to read in data and one more implicit way.
The read method will read in all the data into one text string. This is useful for smaller files where you would like to do text manipulation on the entire file, or whatever else suits you. Then there is readline which is one useful way to only read in individual line incremental amounts at a time and return them as strings. The last explicit method, readlines, will read all the lines of a file and return them as a list of strings.
As mentioned earlier, you can use these methods to only load small chunks of the file at a time. To do this with these methods, you can pass a parameter to them telling how many bytes to load at a time. This is the only argument these methods accept.
One implementation for reading a text file one line at a time might look like the following code. Note that for the remainder of this article I will be demonstrating how to read in the text of the book The "Iliad of Homer" which can be found at gutenberg.org, as well as in the GitHub repo where the code is for this article.
In readline.py you will find the following code. In the terminal if you run $ python readline.py you can see the output of reading all the lines of the Iliad, as well as their line numbers.
filepath = 'Iliad.txt'
with open(filepath) as fp:
line = fp.readline()
cnt = 1
while line:
print("Line {}: {}".format(cnt, line.strip()))
line = fp.readline()
cnt += 1
The above code snippet opens a file object stored as a variable called fp, then reads in a line at a time by calling readline on that file object iteratively in a while loop and prints it to the console.
Running this code you should see something like the following:
$ python forlinein.py
Line 0: BOOK I
Line 1:
Line 2: The quarrel between Agamemnon and Achilles--Achilles withdraws
Line 3: from the war, and sends his mother Thetis to ask Jove to help
Line 4: the Trojans--Scene between Jove and Juno on Olympus.
Line 5:
Line 6: Sing, O goddess, the anger of Achilles son of Peleus, that brought
Line 7: countless ills upon the Achaeans. Many a brave soul did it send
Line 8: hurrying down to Hades, and many a hero did it yield a prey to dogs and
Line 9: vultures, for so were the counsels of Jove fulfilled from the day on
...
While this is perfectly fine, there is one final way that I mentioned fleetingly earlier, which is less explicit but a bit more elegant, which I greatly prefer. This final way of reading in a file line-by-line includes iterating over a file object in a for loop assigning each line to a special variable called line. The above code snippet can be replicated in the following code, which can be found in the Python script forlinein.py:
filepath = 'Iliad.txt'
with open(filepath) as fp:
for cnt, line in enumerate(fp):
print("Line {}: {}".format(cnt, line))
In this implementation we are taking advantage of a built-in Python functionality that allows us to iterate over the file object implicitly using a for loop in combination of using the iterable object fp. Not only is this simpler to read but it also takes fewer lines of code to write, which is always a best practice worthy of following.
An Example Application
I would be remiss to write an application on how to consume information in a text file without demonstrating at least a trivial usage of how to use such a worthy skill. That being said, I will be demonstrating a small application that can be found in wordcount.py, which calculates the frequency of each word present in "The Iliad of Homer" used in previous examples.
import sys
import os
def main():
filepath = sys.argv[1]
if not os.path.isfile(filepath):
print("File path {} does not exist. Exiting...".format(filepath))
sys.exit()
bag_of_words = {}
with open(filepath) as fp:
cnt = 0
for line in fp:
print("line {} contents {}".format(cnt, line))
record_word_cnt(line.strip().split(' '), bag_of_words)
cnt += 1
sorted_words = order_bag_of_words(bag_of_words, desc=True)
print("Most frequent 10 words {}".format(sorted_words[:10]))
def order_bag_of_words(bag_of_words, desc=False):
words = [(word, cnt) for word, cnt in bag_of_words.items()]
return sorted(words, key=lambda x: x[1], reverse=desc)
def record_word_cnt(words, bag_of_words):
for word in words:
if word != '':
if word.lower() in bag_of_words:
bag_of_words[word.lower()] += 1
else:
bag_of_words[word.lower()] = 0
if __name__ == '__main__':
main()
The above code represents a command line python script that expects a file path passed in as an argument. The script uses the os module to make sure that the passed in file path is a file that exists on the disk. If the path exists then each line of the file is read and passed to a function called record_word_cnt as a list of strings, delimited the spaces between words as well as a dictionary called bag_of_words. The record_word_cnt function counts each instance of every word and records it in the bag_of_words dictionary.
Once all the lines of the file are read and recorded in the bag_of_words dictionary, then a final function call to order_bag_of_words is called, which returns a list of tuples in (word, word count) format, sorted by word count. The returned list of tuples is used to print the most frequently occurring 10 words.
Conclusion
So, in this article we have explored ways to read a text file line-by-line in two ways, including a way that I feel is a bit more Pythonic (this being the second way demonstrated in forlinein.py). To wrap things up I presented a trivial application that is potentially useful for reading in and preprocessing data that could be used for text analytics or sentiment analysis.
As always I look forward to your comments and I hope you can use what has been discussed to develop exciting and useful applications.
Mike Driscoll
wxPython: All About Accelerators
The wxPython toolkit supports using keyboard shortcuts via the concept of Accelerators and Accelerator Tables. You can also bind directly to key presses, but in a lot of cases, you will want to go with Accelerators. The accelerator gives to the ability to add a keyboard shortcut to your application, such as the ubiquitous “CTRL+S” that most applications use for saving a file. As long as your application has focus, this keyboard shortcut can be added trivially.
Note that you will normally add an accelerator table to your wx.Frame instance. If you happen to have multiple frames in your application, then you may need to add an accelerator table to multiple frames depending on your design.
Let’s take a look at a simple example:
import wx class MyForm(wx.Frame): def __init__(self): wx.Frame.__init__(self, None, title="Accelerator Tutorial", size=(500,500)) # Add a panel so it looks the correct on all platforms panel = wx.Panel(self, wx.ID_ANY) randomId = wx.NewId() self.Bind(wx.EVT_MENU, self.onKeyCombo, id=randomId) accel_tbl = wx.AcceleratorTable([(wx.ACCEL_CTRL, ord('Q'), randomId )]) self.SetAcceleratorTable(accel_tbl) def onKeyCombo(self, event): """""" print "You pressed CTRL+Q!" # Run the program if __name__ == "__main__": app = wx.App(False) frame = MyForm() frame.Show() app.MainLoop()
This can end up looking a bit ugly if you have a lot of keyboard shortcuts that you need to add to your application as you end upw tih a list of tuples that just looks kind of odd. You will find this way or writing an AcceleratorTable more often than not though. However there are other ways to add entries to your AcceleratorTable. Let’s take a look at an example from wxPython’s documentation:
entries = [wx.AcceleratorEntry() for i in xrange(4)] entries[0].Set(wx.ACCEL_CTRL, ord('N'), ID_NEW_WINDOW) entries[1].Set(wx.ACCEL_CTRL, ord('X'), wx.ID_EXIT) entries[2].Set(wx.ACCEL_SHIFT, ord('A'), ID_ABOUT) entries[3].Set(wx.ACCEL_NORMAL, wx.WXK_DELETE, wx.ID_CUT) accel = wx.AcceleratorTable(entries) frame.SetAcceleratorTable(accel)
Here we create a list of four wx.AcceleratorEntry() objects using a list comprehension. Then we access each of the entries in the list using the Python list’s index to call each entry’s Set method. The rest of the code is pretty similar to what you saw before. Let’s take a moment to make this code actually runnable:
import wx class MyForm(wx.Frame): def __init__(self): wx.Frame.__init__(self, None, title="AcceleratorEntry Tutorial", size=(500,500)) # Add a panel so it looks the correct on all platforms panel = wx.Panel(self, wx.ID_ANY) exit_menu_item = wx.MenuItem(id=wx.NewId(), text="Exit", helpString="Exit the application") about_menu_item = wx.MenuItem(id=wx.NewId(), text='About') ID_NEW_WINDOW = wx.NewId() ID_ABOUT = wx.NewId() self.Bind(wx.EVT_MENU, self.on_new_window, id=ID_NEW_WINDOW) self.Bind(wx.EVT_MENU, self.on_about, id=ID_ABOUT) entries = [wx.AcceleratorEntry() for i in range(4)] entries[0].Set(wx.ACCEL_CTRL, ord('N'), ID_NEW_WINDOW, exit_menu_item) entries[1].Set(wx.ACCEL_CTRL, ord('X'), wx.ID_EXIT) entries[2].Set(wx.ACCEL_SHIFT, ord('A'), ID_ABOUT, about_menu_item) entries[3].Set(wx.ACCEL_NORMAL, wx.WXK_DELETE, wx.ID_CUT) accel_tbl = wx.AcceleratorTable(entries) self.SetAcceleratorTable(accel_tbl) def on_new_window(self, event): """""" print("You pressed CTRL+N!") def on_about(self, event): print('You pressed SHIFT+A') # Run the program if __name__ == "__main__": app = wx.App(False) frame = MyForm() frame.Show() app.MainLoop()
First of all, I want to note that I don’t have all the accelerators hooked up. For example, “CTRL+X” won’t actually exit the program. But I did go ahead and hook up “CTRL+N” and “SHIFT+A”. Try running the code and see how it works.
You can also be slightly more explicit and create your AcceleratorEntry() objects one by one instead of using a list comprehension. Let’s modify our code a bit and see how that works:
import wx class MyForm(wx.Frame): def __init__(self): wx.Frame.__init__(self, None, title="AcceleratorEntry Tutorial", size=(500,500)) # Add a panel so it looks the correct on all platforms panel = wx.Panel(self, wx.ID_ANY) exit_menu_item = wx.MenuItem(id=wx.NewId(), text="Exit", helpString="Exit the application") about_menu_item = wx.MenuItem(id=wx.NewId(), text='About') ID_NEW_WINDOW = wx.NewId() ID_ABOUT = wx.NewId() self.Bind(wx.EVT_MENU, self.on_new_window, id=ID_NEW_WINDOW) self.Bind(wx.EVT_MENU, self.on_about, id=ID_ABOUT) entry_one = wx.AcceleratorEntry(wx.ACCEL_CTRL, ord('N'), ID_NEW_WINDOW, exit_menu_item) entry_two = wx.AcceleratorEntry(wx.ACCEL_SHIFT, ord('A'), ID_ABOUT, about_menu_item) entries = [entry_one, entry_two] accel_tbl = wx.AcceleratorTable(entries) self.SetAcceleratorTable(accel_tbl) def on_new_window(self, event): """""" print("You pressed CTRL+N!") def on_about(self, event): print('You pressed SHIFT+A') # Run the program if __name__ == "__main__": app = wx.App(False) frame = MyForm() frame.Show() app.MainLoop()
Frankly I think like this version the best as it’s the most explicit. The “Zen of Python” is always about advocating doing things explicitly over implicitly so I think this also follows that paradigm well.
Wrapping Up
Now you know a couple of different ways to create keyboard shortcuts (accelerators) for your application. They are very handy and can enhance your application’s usefulness.
Related Reading
- wxPython: Keyboard Shortcuts
- wxPython: Menus, toolbars and accelerators
- wxPython documentation on the wx.AcceleratorTable
- wxPython documentation on wx.AcceleratorEntry
Semaphore Community
Dockerizing a Python Django Web Application
This article is brought with ❤ to you by Semaphore.
Introduction
This article will cover building a simple 'Hello World'-style web application written in Django and running it in the much talked about and discussed Docker. Docker takes all the great aspects of a traditional virtual machine, e.g. a self contained system isolated from your development machine, and removes many of the drawbacks such as system resource drain, setup time, and maintenance.
When building web applications, you have probably reached a point where you want to run your application in a fashion that is closer to your production environment. Docker allows you to set up your application runtime in such a way that it runs in exactly the same manner as it will in production, on the same operating system, with the same environment variables, and any other configuration and setup you require.
By the end of the article you'll be able to:
- Understand what Docker is and how it is used,
- Build a simple Python Django application, and
- Create a simple
Dockerfileto build a container running a Django web application server.
What is Docker, Anyway?
Docker's homepage describes Docker as follows:
"Docker is an open platform for building, shipping and running distributed applications. It gives programmers, development teams, and operations engineers the common toolbox they need to take advantage of the distributed and networked nature of modern applications."
Put simply, Docker gives you the ability to run your applications within a controlled environment, known as a container, built according to the instructions you define. A container leverages your machines resources much like a traditional virtual machine (VM). However, containers differ greatly from traditional virtual machines in terms of system resources. Traditional virtual machines operate using Hypervisors, which manage the virtualization of the underlying hardware to the VM. This means they are large in terms of system requirements.
Containers operate on a shared Linux operating system base and add simple
instructions on top
to execute and run your application or process. The difference being
that Docker doesn't require the often time-consuming process of
installing an entire OS to a virtual machine such as VirtualBox or
VMWare. Once Docker is installed, you create a container with a few
commands and then execute your applications on it via the Dockerfile. Docker
manages the majority of the
operating system virtualization for you, so you can get on with writing
applications and shipping them as you require in the container you have
built. Furthermore, Dockerfiles can be shared for others to build
containers and extend the instructions within them by basing their
container image on top of an existing one. The containers are also
highly portable and will run in the same manner regardless of the host
OS they are executed on. Portability is a massive plus side of Docker.
Prerequisites
Before you begin this tutorial, ensure the following is installed to your system:
- Python 2.7 or 3.x,
- Docker (Mac users: it's recommended to use docker-machine, available via Homebrew-Cask), and
- A git repository to store your project and track changes.
Setting Up a Django web application
Starting a Django application is easy, as the Django dependency provides you with a command line tool for starting a project and generating some of the files and directory structure for you. To start, create a new folder that will house the Django application and move into that directory.
$ mkdir project
$ cd project
Once in this folder, you need to add the standard Python project
dependencies file which is usually named requirements.txt, and add the
Django and Gunicorn dependency to it. Gunicorn is a production standard web
server, which will be used later in the article. Once you have created and added
the dependencies, the file should look like this:
$ cat requirements.txt
Django==1.9.4
gunicorn==19.6.0
With the Django dependency added, you can then install Django using the following command:
$ pip install -r requirements.txt
Once installed, you will find that you now have access to the
django-admin command line tool, which you can use to generate the
project files and directory structure needed for the
simple "Hello, World!" application.
$ django-admin startproject helloworld
Let's take a look at the project structure the tool has just created for you:
.
├── helloworld
│ ├── helloworld
│ │ ├── __init__.py
│ │ ├── settings.py
│ │ ├── urls.py
│ │ └── wsgi.py
│ └── manage.py
└── requirements.txt
You can read more about the structure of Django on the official website.
django-admin tool has created a skeleton application. You control the
application for development purposes using the
manage.py file, which allows you to start the development test web
server for example:
$ cd helloworld
$ python manage.py runserver
The other key file of note is the urls.py, which specifies
what URL's route to which view. Right now, you will only have the default
admin URL which we won't be using in this tutorial. Lets add a URL that will
route to a view returning the classic phrase "Hello, World!".
First, create a new file called views.py in the same directory as
urls.py with the following content:
from django.http import HttpResponse
def index(request):
return HttpResponse("Hello, world!")
Now, add the following URL url(r'', 'helloworld.views.index') to the
urls.py, which will route the base URL of / to our new view. The
contents of the urls.py file should now look as follows:
from django.conf.urls import url
from django.contrib import admin
urlpatterns = [
url(r'^admin/', admin.site.urls),
url(r'', 'helloworld.views.index'),
]
Now, when you execute the python manage.py runserver command and visit
http://localhost:8000 in your browser, you should see the newly added
"Hello, World!" view.
The final part of our project setup is making use of the
Gunicorn web server.
This web server is robust and built
to handle production levels of traffic, whereas the included development
server of Django is more for testing purposes on your local
machine only. Once you have dockerized the application,
you will want to start up the server using Gunicorn. This is much
simpler if you write a small startup script for Docker to execute.
With that in mind, let's add a start.sh bash script to the root of the
project, that will start our application using Gunicorn.
#!/bin/bash
# Start Gunicorn processes
echo Starting Gunicorn.
exec gunicorn helloworld.wsgi:application \
--bind 0.0.0.0:8000 \
--workers 3
The first part of the script writes "Starting Gunicorn" to the
command line to show us that it is starting execution. The next part of
the script actually launches Gunicorn. You use exec here so that the
execution of the command takes over the shell script, meaning that when
the Gunicorn process ends so will the script, which is what we want here.
You then pass the gunicorn command with the first argument of
helloworld.wsgi:application. This is a reference to the wsgi file
Django generated for us and is a Web Server
Gateway Interface file which is the Python standard for
web applications and servers. Without delving too much into WSGI, the
file simply defines the application variable, and Gunicorn knows how to
interact with the object to start the web server.
You then pass two flags to the command, bind to attach the running
server to port 8000, which you will use to communicate with the running
web server via HTTP. Finally, you specify workers which are the
number of threads that will handle the requests coming into your
application. Gunicorn recommends this value to be set at
(2 x $num_cores) + 1. You can read more on configuration of
Gunicorn in their
documentation.
Finally, make the script executable, and then test if it works by changing
directory into the project folder helloworld and executing the script
as shown here. If everything is working fine, you should see similar output to
the one below, be able to visit http://localhost:8000 in your browser, and
get the "Hello, World!" response.
$ chmod +x start.sh
$ cd helloworld
$ ../start.sh
Starting Gunicorn.
[2016-06-26 19:43:28 +0100] [82248] [INFO]
Starting gunicorn 19.6.0
[2016-06-26 19:43:28 +0100] [82248] [INFO]
Listening at: http://0.0.0.0:8000 (82248)
[2016-06-26 19:43:28 +0100] [82248] [INFO]
Using worker: sync
[2016-06-26 19:43:28 +0100] [82251] [INFO]
Booting worker with pid: 82251
[2016-06-26 19:43:28 +0100] [82252] [INFO]
Booting worker with pid: 82252
[2016-06-26 19:43:29 +0100] [82253] [INFO]
Booting worker with pid: 82253
Dockerizing the Application
You now have a simple web application that is ready to be deployed. So far, you have been using the built-in development web server that Django ships with the web framework it provides. It's time to set up the project to run the application in Docker using a more robust web server that is built to handle production levels of traffic.
Installing Docker
One of the key goals of Docker is portability, and as such is able to be installed on a wide variety of operating systems.
For this tutorial, you will look at installing Docker Machine on MacOS. The simplest way to achieve this is via the Homebrew package manager. Instal Homebrew and run the following:
$ brew update && brew upgrade --all && brew cleanup && brew prune
$ brew install docker-machine
With Docker Machine installed, you can use it to create some virtual
machines and run Docker clients. You can run docker-machine from your
command line to see what options you have available. You'll notice that
the general idea of docker-machine is to give you tools to create and
manage Docker clients. This means you can easily spin up a virtual
machine and use that to run whatever Docker containers you want or need
on it.
You will now create a virtual machine based on VirtualBox that will be
used to execute your Dockerfile, which you will create shortly.
The machine you create here should try to mimic the machine
you intend to run your application on in production. This way, you
should not see any differences or quirks in your running application
neither locally nor in a deployed environment.
Create your Docker Machine using the following command:
$ docker-machine create development --driver virtualbox
--virtualbox-disk-size "5000" --virtualbox-cpu-count 2
--virtualbox-memory "4096"
This will create your machine and output useful information on completion. The machine will be created with 5GB hard disk, 2 CPU's and 4GB of RAM.
To complete the setup, you need to add some environment variables to
your terminal session to allow the Docker command to connect the machine
you have just created. Handily, docker-machine provides a simple way
to generate the environment variables and add them to your session:
$ docker-machine env development
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://123.456.78.910:1112"
export DOCKER_CERT_PATH="/Users/me/.docker/machine/machines/development"
export DOCKER_MACHINE_NAME="development"
# Run this command to configure your shell:
# eval "$(docker-machine env development)"
Complete the setup by executing the command at the end of the output:
$(docker-machine env development)
Execute the following command to ensure everything is working as expected.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
You can now dockerize your Python application and get it running using
the docker-machine.
Writing the Dockerfile
The next stage is to add a Dockerfile to your project. This will
allow Docker to build the image it will execute on the Docker Machine
you just created. Writing a Dockerfile is rather
straightforward and has many elements that can be reused and/or found on
the web. Docker provides a lot of the functions that you will
require to build your image. If you need to do something more custom on
your project, Dockerfiles are flexible enough for you to do so.
The structure of a Dockerfile can be considered a series of
instructions on how to build your container/image. For example, the vast
majority of Dockerfiles will begin by referencing a base image provided
by Docker. Typically, this will be a plain vanilla image of the latest
Ubuntu release or other Linux OS of choice. From there, you can set up
directory structures, environment variables, download
dependencies, and many other standard system tasks before finally
executing the process which will run your web application.
Start the Dockerfile by creating an empty file named Dockerfile in
the root of your project. Then, add the first line to the Dockerfile
that instructs which base image to build upon. You can create your own base
image and use that for your containers, which can be beneficial in a department
with many teams wanting to deploy their applications in the same way.
# Dockerfile
# FROM directive instructing base image to build upon
FROM python:2-onbuild
It's worth noting that we are using a base image that has been
created specifically to handle Python 2.X applications and a set of
instructions that will run automatically before the rest of your
Dockerfile. This base image will copy your project to
/usr/src/app, copy your requirements.txt and execute pip install
against it. With these tasks taken care of for you, your Dockerfile
can then prepare to actually run your application.
Next, you can copy the start.sh script written earlier to a path that
will be available to you in the container to be executed later in the
Dockerfile to start your server.
# COPY startup script into known file location in container
COPY start.sh /start.sh
Your server will run on port 8000. Therefore, your container must be set up
to allow access to this port so that you can communicate to your running
server over HTTP. To do this, use the EXPOSE directive to make the
port available:
# EXPOSE port 8000 to allow communication to/from server
EXPOSE 8000
The final part of your Dockerfile is to execute the start
script added earlier, which will leave your web server running on port
8000 waiting to take requests over HTTP. You can execute this script
using the CMD directive.
# CMD specifcies the command to execute to start the server running.
CMD ["/start.sh"]
# done!
With all this in place, your final Dockerfile should look something
like this:
# Dockerfile
# FROM directive instructing base image to build upon
FROM python:2-onbuild
# COPY startup script into known file location in container
COPY start.sh /start.sh
# EXPOSE port 8000 to allow communication to/from server
EXPOSE 8000
# CMD specifcies the command to execute to start the server running.
CMD ["/start.sh"]
# done!
You are now ready to build the container image, and then run it to see it all working together.
Building and Running the Container
Building the container is very straight forward once you have Docker and
Docker Machine on your system. The following command will look for your
Dockerfile and download all the necessary layers required to get your
container image running. Afterwards, it will run the instructions in the
Dockerfile and leave you with a container that is ready to start.
To build your container, you will use the docker build command and
provide a tag or a name for the container, so you can reference it later
when you want to run it. The final part of the command tells Docker
which directory to build from.
$ cd <project root directory>
$ docker build -t davidsale/dockerizing-python-django-app .
Sending build context to Docker daemon 237.6 kB
Step 1 : FROM python:2-onbuild
# Executing 3 build triggers...
Step 1 : COPY requirements.txt /usr/src/app/
---> Using cache
Step 1 : RUN pip install --no-cache-dir -r requirements.txt
---> Using cache
Step 1 : COPY . /usr/src/app
---> 68be8680cbc4
Removing intermediate container 75ed646abcb6
Step 2 : COPY start.sh /start.sh
---> 9ef8e82c8897
Removing intermediate container fa73f966fcad
Step 3 : EXPOSE 8000
---> Running in 14c752364595
---> 967396108654
Removing intermediate container 14c752364595
Step 4 : WORKDIR helloworld
---> Running in 09aabb677b40
---> 5d714ceea5af
Removing intermediate container 09aabb677b40
Step 5 : CMD /start.sh
---> Running in 7f73e5127cbe
---> 420a16e0260f
Removing intermediate container 7f73e5127cbe
Successfully built 420a16e0260f
In the output, you can see Docker processing each one of your commands before outputting that the build of the container is complete. It will give you a unique ID for the container, which can also be used in commands alongside the tag.
The final step is to run the container you have just built using Docker:
$ docker run -it -p 8000:8000 davidsale/djangoapp1
Starting Gunicorn.
[2016-06-26 19:24:11 +0000] [1] [INFO]
Starting gunicorn 19.6.0
[2016-06-26 19:24:11 +0000] [1] [INFO]
Listening at: http://0.0.0.0:9077 (1)
[2016-06-26 19:24:11 +0000] [1] [INFO]
Using worker: sync
[2016-06-26 19:24:11 +0000] [11] [INFO]
Booting worker with pid: 11
[2016-06-26 19:24:11 +0000] [12] [INFO]
Booting worker with pid: 12
[2016-06-26 19:24:11 +0000] [17] [INFO]
Booting worker with pid: 17
The command tells Docker to run the container and forward the exposed
port 8000 to port 8000 on your local machine. After you run
this command, you should be able to visit http://localhost:8000
in your browser to see the "Hello, World!" response. If you were
running on a Linux machine, that would be the case. However, if running
on MacOS, then you will need to forward the ports from VirtualBox, which
is the driver we use in this tutorial so that they are accessible on
your host machine.
$ VBoxManage controlvm "development" natpf1
"tcp-port8000,tcp,,8000,,8000";
This command modifies the configuration of the virtual machine created
using docker-machine earlier to forward port 8000 to your host
machine. You can run this command multiple times changing the values
for any other ports you require.
Once you have done this, visit http://localhost:8000 in your browser.
You should be able to visit your dockerized Python Django
application running on a Gunicorn web server, ready to take thousands of
requests a second and ready to be deployed on virtually
any OS on planet using Docker.
Next Steps
After manually verifying that the application is behaving as expected in Docker, the next step is the deployment. You can use Semaphore's Docker platform for automating this process.
Continuous Integration and Deployment for Docker projects on Semaphore
As a first step you need to create a free Semaphore account. Then, connect your Docker project repository to your new account. Semaphore will recognize that you're using Docker, and will automatically recommend the Docker platform for it.
The last step is to specify commands to build and run your Docker images:
docker build <your-project> .
docker run <your-project>
Semaphore will execute these commands on every git push.
Semaphore also makes it easy to push your images to various Docker container registries. To learn more about getting the most out of Docker on Semaphore, check out our Docker documentation pages.
Conclusion
In this tutorial, you have learned how to build a simple Python Django web application, wrap it in a production grade web server, and created a Docker container to execute your web server process.
If you enjoyed working through this article, feel free to share it and if you have any questions or comments leave them in the section below. We will do our best to answer them, or point you in the right direction.
This article is brought with ❤ to you by Semaphore.
EuroPython
EuroPython 2017: Videos for Monday available online
We are pleased to announce the first batch of cut videos for EuroPython 2017.
To see the videos, please head over to our EuroPython YouTube channel and select the “EuroPython 2017″ playlist.
Here’s our brand new teaser video for the conference:
In the coming weeks, we will release the other videos, in batches of one conference day per week.
Enjoy,
–
EuroPython 2017 Team
EuroPython Society
EuroPython 2017 Conference
hypothesis.works articles
The Threshold Problem
In my last post I mentioned the problem of bug slippage: When you start with one bug, reduce the test case, and end up with another bug.
I’ve run into another related problem twice now, and it’s not one I’ve seen talked about previously.
The problem is this: Sometimes shrinking makes a bug seem much less interesting than it actually is.
Read more..."Fredrik Håård's Blaag"
Provisioning Elasticsearch and Kibana with Terraform
Fair warning up front: this is not a Terraform, AWS, or Elasticsearch tutorial. You'll need to know a bit or read the docs to apply the examples.
When I wanted to add the AWS version of ELK (Elasticsearch, Logstash, Kibana) which is Elasticsearch, Cloudwatch and Kibana, I hit a road block that Terraform did not support provisioning the actual streaming of logs from Cloudwatch to Elasticsearch naively. Googling lead me approximately nowhere, and I had to devise a solution from scratch.
Automating setting up Cloudwatch once hosts are set up is reasonably straightforward using e.g. Ansible, so it's the Lambda function to parse messages and send them to Elasticsearch that is the problem. It's super simple to set up if you follow the docs and click through the AWS console, but there are few hints on how you would automate it reliably.
Terraform supports setting up Cloudwatch log subscription filters and Elasticsearch clusters/domains as well as Lambda functions, and once you know what to put in the Lambda, the process is just plain old Terraform. While the solution is not very sophisticated, it works for me ^(tm) and I belive it might work for others as well.
As so often in software, the solution is to copy something that works. In this case, I created a subscription filter from some random Cloudwatch group to a freshly created Elasticsearch domain on a test account, and then checked the resulting (NodeJS) code for the generated Lambda function. To my delight, the code has almost no environment- or target specific contents except for the Elasticsearch endpoint itself, so there's not much that needs to be done to adapt it for any setup.
In this case, I simply replaced references to the endpoint with references to a environment variable ES_ENDPOINT, which I then inject when creating the lambda function.
Putting the resulting index.js (or whatever) into a zip file, we can then provision it using Terraform:
resource "aws_lambda_function" "logs-to-es-lambda" {
filename = "files/es_logs_lambda.zip"
description = "CloudWatch Logs to Amazon ES"
function_name = "Logs_To_Elasticsearch"
role = "${aws_iam_role.my-es-execution-role.arn}"
handler = "index.handler"
source_code_hash = '${base64sha256(file("files/es_logs_lambda.zip"))}'
runtime = "nodejs4.3"
timeout = 60
environment {
variables = {
ES_ENDPOINT = "${aws_elasticsearch_domain.my-es-domain.endpoint}"
}
}
}
With the Lambda function in place, we can create a log subscription filter to send events to it:
resource "aws_lambda_permission" "cloudwatch-lambda-permission" {
statement_id = "allow-cloudwatch-lambda"
action = "lambda:InvokeFunction"
function_name = "${aws_lambda_function.logs-to-es-lambda.arn}"
principal = "logs.${var.my-aws-region}.amazonaws.com"
source_arn = "${aws_cloudwatch_log_group.my-log-group.arn}"
}
resource "aws_cloudwatch_log_subscription_filter" "logs-subscription" {
name = "ElasticsearchStream-logs"
depends_on = ["aws_lambda_permission.cloudwatch-lambda-permission"]
log_group_name = "${aws_cloudwatch_log_group.my-log-group.name}"
filter_pattern = "[timestamp, level, thread, name, message]"
destination_arn = "${aws_lambda_function.logs-to-es-lambda.arn}"
}
The filter pattern follows the same rules as in the AWS console, and I'd generally use the console and some example data to try it out since I find the documentation incredibly hard to find and parse.
For a full example, you'd of course need to provision the role (aws_iam_role), domain (aws_elasticsearch_domain) and stream some interesting logs to the log group! In addition, the first time I did this I hooked a quick Python hack to inject the endpoint and zip the JavaScript source on the fly, but I don't think that makes much sense in retrospect. The last couple of projects I've done this, I've simply committed the ZIP to the source repository instead.
September 27, 2017
Stack Abuse
Command Line Arguments in Python
Overview
Computer programs are written with a specific purpose in mind. Tools on UNIX/Linux systems follow the idea of specialization - one tool for one task, but to do it as perfect as possible, then. Nethertheless, as with other tools, this allows you to combine single programs, and to create powerful tool chains.
With the help of command line arguments that are passed to programs, you can deal with much more specific use cases. Command line arguments allow you to enable programs to act in a certain way, for example to output additional information, or to read data from a specified source, and to interpret this data in a desired format.
In general, operating systems accept arguments in a certain notation, for example:
- UNIX: "-" followed by a letter, like "-h"
- GNU: "--" followed by a word, like "--help"
- Microsoft Windows: "/" followed by either a letter, or word, like "/help"
These different approaches exist due to historical reasons. Many programs on UNIX-like systems support either the UNIX way, or the GNU way, or both. The UNIX notation is mostly used with single letter options while GNU presents a more readable options list particularly useful to document what is running.
Keep in mind that both the name and the meaning of an argument are specific to a program - there is no general definition, but a few conventions like --help for further information on the usage of the tool. As the developer of a Python script you decide which arguments are valid, and what they stand for, actually. This requires proper evaluation. Read on how to do it using Python.
Handling command line arguments with Python
Python 3 supports four different ways of handling command line arguments. The oldest one is the sys module. In terms of names, and its usage, it relates directly to the C library (libc). The second way is the getopt module which handles both short, and long options, including the evaluation of the parameter values.
Furthermore, two lesser-known ways exist. This is the argparse module which is derived from the optparse module available up to Python 2.7, formerly, and the docopt module is available from GitHub. All the modules are fully documented and worth reading.
The sys Module
This is a basic module that was shipped with the Python distribution from the early days on. It has a quite similar approach as the C library using argc/argv to access the arguments. The sys module implements the command line arguments in a simple list structure named sys.argv.
Each list element represents a single argument. The first one -- sys.argv[0] -- is the name of the Python script. The other list elements -- sys.argv[1] to sys.argv[n] -- are the command line arguments 2 to n. As a delimiter between the arguments, a space is in use. Argument values that contain a space in it have to be quoted, accordingly.
The equivalent of argc is just the number of elements in the list. To obtain this value use the Python len() operator. Example 2 will explain this in detail.
Example 1
In this first example, we determine the way we were called. This information is kept in the first command line argument, indexed with 0. Listing 1 displays how you obtain the name of your Python script.
Example 1: Determine the name of the Python script
import sys
print ("the script has the name %s" % (sys.argv[0])
Save this code in a file named arguments-programname.py, and then call it as shown in Listing 1. The output is as follows and contains the file name, including its full path.
Listing 1: Call the script
$ python arguments-programname.py
the script has the name arguments-programname.py
$
$ python /home/user/arguments-programname.py
the script has the name /home/user/arguments-programname.py
Example 2
In the second example we simply count the number of command line arguments using the built-in len() method. sys.argv is the list that we have to examine. In Example 2, a value of 1 is subtracted to get the right index (argument list counters start from zero). As you may remember from Example 1, the first element contains the name of the Python script, which we skip here.
Example 2: Count the arguments
import sys
# count the arguments
arguments = len(sys.argv) - 1
print ("the script is called with %i arguments" % (arguments))
Save and name this file arguments-count.py. The call is displayed in Listing 2. This includes three different scenarios: a) a call without any further command line arguments, b) a call with two arguments, and c) with two arguments where the second one is a quoted string (a string that contains a space).
Listing 2: Call the script
$ python arguments-count.py
the script is called with 0 arguments
$
$ python arguments-count.py --help me
the script is called with 2 arguments
$
$ python arguments-count.py --option "long string"
the script is called with 2 arguments
Example 3
The third example outputs every single argument the Python script that is called with, except the program name itself. Therefore, we loop through the command line arguments starting with the second list element. As stated before, this element has index 1.
Example 3: Output arguments
import sys
# count the arguments
arguments = len(sys.argv) - 1
# output argument-wise
position = 1
while (arguments >= position):
print ("parameter %i: %s" % (position, sys.argv[position]))
position = position + 1
In Listing 3 the Python script is named arguments-output.py. As done in Listing 2, the output illustrates three different calls: a) without any arguments, b) with two arguments, and c) also with two arguments where the second argument is a quoted string that consists of two single words, separated by a space.
Listing 3: call the script
$ python arguments-output.py
$
$ python arguments-output.py --help me
parameter 1: --help
parameter 2: me
$
$ python arguments-output.py --option "long string"
parameter 1: --option
parameter 2: long string
$
The getopt Module
As you may have seen before the sys module splits the command line string into single fascets only. The Python getopt module goes a bit further, and extends the separation of the input string by parameter validation. Based on the getopt C function, it allows both short, and long options including a value assignment.
In practice it requires the sys module to process input data properly. To do so, both the sys module and the getopt module have to be loaded beforehand. Next, from the list of input parameters we remove the first list element (see Example 4.1), and store the remaining list of command line arguments in the variable called argumentList.
Example 4.1: Preparing the input parameters
# include standard modules
import getopt, sys
# read commandline arguments, first
fullCmdArguments = sys.argv
# - further arguments
argumentList = fullCmdArguments[1:]
print argumentList
Now, argumentList can be parsed using the getopts() method. Before doing that, getopts() needs to know about the valid parameters. They are defined like this:
Example 4.2: Preparing the valid parameters
unixOptions = "ho:v"
gnuOptions = ["help", "output=", "verbose"]
This means that these arguments are seen as the valid ones, now:
------------------------------------------
long argument short argument with value
------------------------------------------
--help -h no
--output -o yes
--verbose -v no
------------------------------------------
Next, this allows you to process the argument list. The getopt() method requires three parameters - the list of the remaining arguments, as well as both the valid UNIX, and GNU options (see table above).
The method call itself is kept in a try-catch-statement to cover errors during the evaluation. An exception is raised if an argument is discovered that is not part of the list as defined before (see Example 4.2). The Python script will print the error message to the screen, and exit with error code 2.
Example 4.3: Parsing the argument list
try:
arguments, values = getopt.getopt(argumentList, unixOptions, gnuOptions)
except getopt.error as err:
# output error, and return with an error code
print (str(err))
sys.exit(2)
Finally, the arguments with the corresponding values are stored in the two variables named arguments, and values. Now, you can evaluate these variables (see Example 4.4). The for-loop goes through the list of recognized arguments, one entry after the next.
Example 4.4: Specific actions according to the argument
# evaluate given options
for currentArgument, currentValue in arguments:
if currentArgument in ("-v", "--verbose"):
print ("enabling verbose mode")
elif currentArgument in ("-h", "--help"):
print ("displaying help")
elif currentArgument in ("-o", "--output"):
print (("enabling special output mode (%s)") % (currentValue))
In Listing 4 you see the output of the program calls. These calls are displayed with both valid and invalid program arguments.
Listing 4: Testing the getopts() method
$ python arguments-getopt.py -h
displaying help
$
$ python arguments-getopt.py --help
displaying help
$
$ python arguments-getopt.py --output=green --help -v
enabling special output mode (green)
displaying help
enabling verbose mode
$
$ python arguments-getopt.py -verbose
option -e not recognized
$
The argparse Module
The argparse module is available starting in Python 3.2, and an enhancement of the optparse module that exists up to Python 2.7. The Python documentation contains an API description and a tutorial that covers all the methods in detail.
It offers a command line interface with a standardized output, whereas the former two solutions leave most of the work in your hands. argparse allows the verification of fixed and optional arguments with name checking as either UNIX or GNU style. As a default optional argument, it includes -h, along with its long version --help. This argument is accompanied by a default help message describing the argument.
Example 5 shows the parser initialization, and Listing 5 shows the basic call, followed by the help message. In contrast to the Python calls we used in the previous examples, keep in mind to use Python 3 with these examples.
Example 5: Basic usage of the argparse module
# include standard modules
import argparse
# initiate the parser
parser = argparse.ArgumentParser()
parser.parse_args()
Listing 5: Calling the basic argparse program
$ python3 arguments-argparse-basic.py
$
$ python3 arguments-argparse-basic.py -h
usage: arguments-argparse-basic.py [-h]
optional arguments:
-h, --help show this help message and exit
$
$ python3 arguments-argparse-basic.py --verbose
usage: arguments-argparse-basic.py [-h]
arguments-argparse-basic.py: error: unrecognized arguments: --verbose
$
As the next step, we will add a custom description to the help message. Initializing the parser allows an additional text. Example 6 stores the description in the text variable, which is explicitly given to the argparse class. In Listing 6 you see what the output looks like.
Example 6: Adding a description to the help message
# include standard modules
import argparse
# define the program description
text = 'This is a test program. It demonstrates how to use the argparse module with a program description.'
# initiate the parser with a description
parser = argparse.ArgumentParser(description = text)
parser.parse_args()
Listing 6: Calling the help with a program description
$ python3 arguments-argparse-description.py --help
usage: arguments-argparse-description.py [-h]
This is a test program. It demonstrates how to use the argparse module with a
program description.
optional arguments:
-h, --help show this help message and exit
As the final step we will add an optional argument named -V (UNIX style), which has a corresponding GNU style argument named --version. To do so we use the method add_argument() that we call with three parameters (displayed for --version, only):
- the name of the parameter:
--version - the help text for the parameter: help="show program version"
- action (without additional value): action="store_true"
The source code for that is displayed in Example 7. Reading the arguments into the variable called args is done via the parse_args() method from the parser object. Note that you submit both the short and the long version in one call. Finally, you check if the attributes args.V or args.version are set and output the version message.
Example 7: Defining an optional argument
# include standard modules
import argparse
# initiate the parser
parser = argparse.ArgumentParser()
parser.add_argument("-V", "--version", help="show program version", action="store_true")
# read arguments from the command line
args = parser.parse_args()
# check for --version or -V
if args.version:
print("this is myprogram version 0.1")
Listing 7 displays the output if you call the program._
$ python3 arguments-argparse-optional.py -V
this is myprogram version 0.1
$
$ python3 arguments-argparse-optional.py --version
this is myprogram version 0.1
The --version argument does not require a value to be given on the command line. That's why we set the action argument to "store_true". In other cases you need an additional value, for example if you specify a certain volume, height, or width. This is shown in the next example. As a default case, please note that all the arguments are interpreted as strings.
Example 8: Defining an optional argument with value
# include standard modules
import argparse
# initiate the parser
parser = argparse.ArgumentParser()
# add long and short argument
parser.add_argument("--width", "-w", help="set output width")
# read arguments from the command line
args = parser.parse_args()
# check for --width
if args.width:
print("set output width to %s" % args.width)
Listing 8 shows different output values. This includes both the short and the long version as well as the help message.
Listing 8: Different output values
$ python3 arguments-argparse-optional2.py -w 10
set output width to 10
$
$ python3 arguments-argparse-optional2.py --width 10
set output width to 10
$
$ python3 arguments-argparse-optional2.py -h
usage: arguments-argparse-optional2.py [-h] [--width WIDTH]
optional arguments:
-h, --help show this help message and exit
--width WIDTH, -w WIDTH
set output width
Conclusion
In this article we showed many different methods of retrieving command line arguments in Python, including using sys, getopt, and argparse. These modules vary in functionality -- some providing much more than others. sys is fully flexible whereas both getoptand argparse require some structure. In contrast, they cover most of the complex work that sys leaves to your hands. After working through the examples provided, you should be able to determine which module suits your project best.
In this article we did not talk about the docopts module -- we just mentioned it. This module follows a totally different approach, and will be explained in detail in one of the next articles.
References
Acknowledgements
The author would like to thank Gerold Rupprecht for his support, and critics while preparing this article.
PyCharm
PyCharm 2017.3 EAP 3 Out now!
The latest and greatest early access program (EAP) version of PyCharm is now available from our website:
New in this version
__all__ warnings
Although Python doesn’t have support for private members, there are a couple of ways to indicate to developers that something shouldn’t be used. In addition to prefixing a method with an underscore, you can also explicitly specify public members with __all__. PyCharm can now warn you if you inadvertently use a member that’s excluded. Just be sure to enable the “Access to a protected member of a class or module” inspection in Settings | Editor | Inspections.
Improved Completion for JOIN statements
Before, we would suggest the table names, now we’ll complete the entire JOIN statement. When you have your foreign keys properly set up, that is:
Scripts for create-react-app
When creating a new react app, you can pass a scripts option to the utility to add support for additional features to your new application. This is now supported in PyCharm:
Further Improvements
- We fixed an issue where sometimes we couldn’t find the step definition from a Lettuce .feature file
- When adding a
__all__to your modules and packages, you’ll now have completion for applicable items - Improved CSS editing support
- The
PIVOTkeyword for Oracle, and theGROUP BY ROLLUPandGROUP BY CUBEconstructs for MS SQL Server are now supported - And more, have a look at the release notes for details
If these features sound interesting to you, try them yourself:
As a reminder, PyCharm EAP versions:
- Are free, including PyCharm Professional Edition EAP
- Will work for 30 days from being built, you’ll need to update when the build expires
If you run into any issues with this version, or another version of PyCharm, please let us know on our YouTrack. If you have other suggestions or remarks, you can reach us on Twitter, or by commenting on the blog.
PyCon
PyCon 2018 Call for Proposals is Open!
It’s here! PyCon 2018’s Call for Proposals has officially opened for talks, tutorials, posters, and education summit presentations. PyCon is made by you, so we need you to share what you’re working on, how you’re working on it, what you’ve learned, what you’re learning, and so much more.
Before we dive in, the deadlines:
- Tutorial proposals are due November 24, 2017.
- Talk, Poster, and Education Summit proposals are due January 3, 2018.
If you’re reading this post, you should write a proposal. PyCon is about uniting and building the Python community, and we won’t advance as an open community if we’re not open with each other about what we’ve learned throughout our time in it. It isn’t about being the smartest one in the room, so we don’t just pick all of the expert talks. It’s about helping everyone move together. “A rising tide lifts all boats,” if you will.
We need beginner, intermediate, and advanced proposals on all sorts of topics. We also need beginner, intermediate, and advanced speakers to give said presentations. You don’t need to be a 20 year veteran who has spoken at dozens of conferences. On all fronts, we need all types of people. That’s what this community is comprised of, so that’s what this conference’s schedule should be made from.
When should you write your proposal? As soon as possible!
What we need now is for your submissions to start rolling in. We review proposals as soon as they’re entered, maximizing your time in front of the program committee and before they begin voting to determining the schedule. While we accept proposals right up to the deadline, the longer your proposal has been available for review, the better we can help you make it. That extra help goes a long way when you consider the large volume of proposals we anticipate receiving.
For PyCon 2017, we received 583 talk proposals, which makes for a 16% acceptance rate. The tutorial acceptance rate was at 27%, with 117 submissions.
Who can help you with your proposal? A lot of people!
Outside of our program committee, a great source of assistance with proposals comes from your local community. User groups around the world have had sessions where people bring ideas to the table and walk away with a full fledged proposal. These sessions are especially helpful if you’re new to the process, and if you’re experienced with the process, it’s a great way for you to reach out and help people level up. We’ll be sure to share these events as we find out about them, and be sure to tell us your plans if you want to host a proposal event of your own!
We’re also trying something new for 2018 where we provide a mechanism to connect willing mentors and those seeking assistance through our site, helping not only with the brainstorming process but about the proposal, slides, and presentation itself. Read on to find out more and checkout out the “Mentoring” section of https://us.pycon.org/2018/speaking/talks/.
Where should you submit your proposal? In your dashboard!
After you’ve created an account at https://us.pycon.org/2018/account/signup/, you’ll want to create a speaker profile in your dashboard. While there, enter some details about yourself and check the various boxes about giving or receiving mentorship, as well as grant needs. Like proposals, you can come back and edit this later.
After that’s done, clicking on the “Submit a new proposal” button gives you the choice of proposal type, and from there you enter your proposal. We’ve provided some guidelines on the types of proposals you can submit, so please be sure to check out the following pages for more information:
We look forward to seeing all of your proposals in the coming months!
Written by Brian Curtin
Edited by Kaitlin Dershaw Durbin
Doug Hellmann
os.path — Platform-independent Manipulation of Filenames — PyMOTW 3
Writing code to work with files on multiple platforms is easy using the functions included in the os.path module. Even programs not intended to be ported between platforms should use os.path for reliable filename parsing. Read more… This post is part of the Python Module of the Week series for Python 3. See PyMOTW.com for …
Continue reading "os.path — Platform-independent Manipulation of Filenames — PyMOTW 3"
Test and Code
31: I'm so sick of the testing pyramid
What started as a twitter disagreement carries over into this civil discussion of software testing.
Brian and Paul discuss testing practices such as the testing pyramid, TDD, unit testing, system testing, and balancing test effort.
- the Testing Pyramid
- the Testing Column
- TDD
- unit testing
- balancing unit with system tests, functional tests
- API testing
- subcutaneous testing
- customer facing tests
Special Guest: Paul Merrill.
Sponsored By:
- Test & Code Patreon Supporters: Thank you to Patreon Supporters
- Nerdlettering: Love Python? Show It With Some Python Swag Custom-made Mugs and Accessories for Pythonistas, by Pythonistas. Promo Code: TESTCODE
Links:
- Episode 34 - Software and Testing Models with Guest Host Brian Okken - Reflection As A Service — Cross posted to RaaS
- Subcutaneous Test — I use subcutaneous test to mean a test that operates just under the UI of an application.
- The Forgotten Layer of the Test Automation Pyramid — At the base of the test automation pyramid is unit testing.
- The Dreyfus model of skill acquisition — The Five-Stage Model of Adult Skill Acquisition
Yasoob Khalid
Intermediate Python Book Anniversary
![]()
Hopefully this will be the last update regarding my book for a while. It has been 2 years
⏱ since I self-published my “Intermediate Python” book ![]()
. In just a short span of 2 years I can not thank Allah enough for the level of success the book has achieved.
It has been translated into Chinese, Russian, Portuguese and Korean. A couple of days ago it hit the ![]()
![]()
520,000 readers mark ![]()
![]()
. Yes, you read that right! The book has been read by half a million people to date. This is just for the English version and I am pretty sure that Chinese version has been read by a lot more people.
The book has been opened in almost every part of the world at least once. The most rewarding experience is when you meet someone and they casually tell you how they came across your book online and actually learned something new from it.
Just wanted to thank each and every one of you who supported me. Do remember me in your prayers as I embark on new adventures. ![]()

I would also like to thank one very important person. This project wouldn’t have been possible without the inspiration and support I got from Daniel Roy Greenfeld. He has been a great mentor and I can always rely on his wisdom whenever in doubt.
Homepage: http://yasoob.me
Book link: http://book.pythontips.com
Full Stack Python
Setting up PostgreSQL with Python 3 and psycopg on Ubuntu 16.04
PostgreSQL is a powerful open source relational database frequently used to create, read, update and delete Python web application data. Psycopg2 is a PostgreSQL database driver that serves as a Python client for access to the PostgreSQL server. This post explains how to install PostgreSQL on Ubuntu 16.04 and run a few basic SQL queries within a Python program.
We won't cover object-relational mappers (ORMs) in this tutorial but these steps can be used as a prerequisite to working with an ORM such as SQLAlchemy or Peewee.
Tools We Need
Our walkthrough should work with either Python 2 or 3 although all the steps were tested specifically with Python 3.5. Besides the Python interpreter, here are the other components we'll use:
- Ubuntu 16.04.2 (these steps should also work fine with other Ubuntu versions)
- pip and virtualenv to handle the psycopg2 application dependency
- PostgreSQL
If you aren't sure how to install pip and virtualenv, review the first few steps of the how to set up Python 3, Bottle and Green Unicorn on Ubuntu 16.04 LTS guide.
Install PostgreSQL
We'll install PostgreSQL via the apt package manager. There are a few
packages we need since we want to both run PostgreSQL and use the psycopg2
driver with our Python programs. PostgreSQL will also be installed as a
system service so we can start, stop and reload its configuration when
necessary with the service command. Open the terminal and run:
sudo apt-get install postgresql libpq-dev postgresql-client postgresql-client-common
Enter your sudo password when prompted and enter 'yes' when apt asks
if you want to install the new packages.

After a few moments apt will finish downloading, installing and
processing.

We now have PostgreSQL installed and the PostgreSQL service is running
in the background. However, we need to create a user and a database instance
to really start using it. Use the sudo command to switch to the new
"postgres" account.
sudo -i -u postgres
Within the "postgres" account, create a user from the command line with the
createuser command. PostgreSQL will prompt you with several questions.
Answer "n" to superuser and "y" to the other questions.
createuser matt -P --interactive

Awesome, now we have a PostgreSQL user that matches our Ubuntu login account. Exit out of the postgres account by pressing the "Ctrl" key along with "d" into the shell. We're back in our own user account.
Create a new database we can use for testing. You can name it "testpython" or whatever you want for your application.
createdb testpython
Now we can interact with "testpython" via the PostgreSQL command line tool.
Interacting with PostgreSQL
The psql command line client is useful for connecting directly to our
PostgreSQL server without any Python code. Try out psql by using this
command at the prompt:
psql
The PostgreSQL client will connect to the localhost server. The client is now ready for input:

Try out PostgreSQL's command prompt a try with commands such as \dt and
\dd. We can also run SQL queries such as "SELECT * from testpython",
although that won't give us back any data yet because we have not inserted
any into the database. A full list of PostgreSQL commands can be
found in the
psql documentation.
Installing psycopg2
Now that PostgreSQL is installed and we have a non-superuser account, we
can install the psycopg2 package. Let's
figure out where our python3 executable is located, create a virtualenv
with python3, activate the virtualenv and then install the psycopg2 package
with pip. Find your python3 executable using the which command.
which python3
We will see output like what is in this screenshot.

Create a new virtualenv in either your home directory or wherever you
store your Python virtualenvs. Specify the full path to your python3
installation.
# specify the system python3 installation
virtualenv --python=/usr/bin/python3 venvs/postgrestest
Activate the virtualenv.
source ~/venvs/postgrestest/bin/activate
Next we can install the psycopg2 Python package from
PyPI using the pip command.
pip install psycopg2

Sweet, we've got our PostgreSQL driver installed in our virtualenv! We can now test out the installation by writing a few lines of Python code.
Using PostgreSQL from Python
Launch the Python REPL with the python or python3 command. You can also
write the following code in a Python file such as "testpostgres.py" then
execute it with python testpostgres.py. Make sure to replace the "user"
and "password" values with your own.
import psycopg2
try:
connect_str = "dbname='testpython' user='matt' host='localhost' " + \
"password='myOwnPassword'"
# use our connection values to establish a connection
conn = psycopg2.connect(connect_str)
# create a psycopg2 cursor that can execute queries
cursor = conn.cursor()
# create a new table with a single column called "name"
cursor.execute("""CREATE TABLE tutorials (name char(40));""")
# run a SELECT statement - no data in there, but we can try it
cursor.execute("""SELECT * from tutorials""")
rows = cursor.fetchall()
print(rows)
except Exception as e:
print("Uh oh, can't connect. Invalid dbname, user or password?")
print(e)
When we run the above code we won't get anything fancy, just an empty list printed out. However, in those few lines of code we've ensured our connection to our new database works and we can create new tables in it as well as query them.

That's just enough of a hook to get started writing more complicated SQL queries using psycopg2 and PostgreSQL. Make sure to check out the PostgreSQL, relational databases and object-relational mappers (ORMs) pages for more tutorials.
Questions? Tweet @fullstackpython or post a message on the Full Stack Python Facebook page.
See something wrong in this post? Fork this page's source on GitHub and submit a pull request.
Brad Lucas
Ads Txt Files
Recently, my major focus has been AdTech. With this I came across the Ads.txt project. This project is a simple initiative proposed as a way to help publishers ensure inventory is sold only through authorized dealers and partners.
Publishers
Publishers create and add a file called ads.txt at the root of their sites. This text file contains a list of names and information about authorized ad networks, SSPs and exchanges that have permission to sell the publisher's inventory.
Buyers
Buyers can when purchasing inventory through and exchange can go to the publisher's domain and check that the exchange is authorized to sell inventory from the publisher by reading the publisher's ads.txt file.
Reference Crawler
The folks over at the IAB Tech Lab have released a reference implementation in Python of a simple crawler for Ads.txt files. You pass it a list of domains and it reads the ads.txt from each site and adds the contents of each to a local SQLite database.
Here is the repo.
As of this writing the master branch doesn't work. If you look at my fork of the project you'll see that the fix is to simply comment out the ADSYSTEM_DOMAIN column from the databse creation script. See the following for details.
Links
For more reference here are a collection of links to review
- IAB Tect Lab - Ads.txt Spec
- Setup An Ads.txt Web Crawler
- State Of Ads.txt Adoption
- WTF is ads.txt?
- Ads.txt Could Wipe Out a Legion of Programmatic Ad Players.
- AppNExuys Support for Ads.txt
- OpenX - Ads.txt for Demand Partners
- Ads.txt by IAB Tech Lab 101
Jeff Hinrichs
MicroPython on ESP32: Tools – esptool.py
These are my notes on using some MicroPython specific tools in relation to a ESP32-DevKitC board.
These notes are for v2.1 of the esptool; an ESP8266 and ESP32 serial bootloader utility.
esptool has a number of functions, but I will only speak to those features required to identify the chip, get flash information and load the MicroPython firmware. See the docs for more information.
Installing esptool
(myproject) $ pip install esptool
Confirm install
(myproject) $ esptool.py version
Display chip information (chip_id)
(myproject) $ esptool.py -p /dev/ttyUSB0 chip_id esptool.py v2.1 Connecting...... Detecting chip type... ESP32 Chip is ESP32D0WDQ6 (revision 1) Uploading stub... Running stub... Stub running... Chip ID: 0x7240ac40964 Hard resetting...
Display Flash memory information (flash_id)
(myproject) $ esptool.py -p /dev/ttyUSB0 flash_id esptool.py v2.1 Connecting.... Detecting chip type... ESP32 Chip is ESP32D0WDQ6 (revision 1) Uploading stub... Running stub... Stub running... Manufacturer: c8 Device: 4016 Detected flash size: 4MB Hard resetting...
Display MAC address of wifi adapter (read_mac)
(myproject) $ esptool.py -p /dev/ttyUSB0 read_mac esptool.py v2.1 Connecting.... Detecting chip type... ESP32 Chip is ESP32D0WDQ6 (revision 1) Uploading stub... Running stub... Stub running... MAC: 24:0a:c4:09:64:c8 Hard resetting...
Loading MicroPython Firmware
You will need MicroPython firmware http://micropython.org/download#esp32
I download to a directory named images in my project folder. Since the ESP32 code is under development, I check out the GitHub commit page for the chip for any interesting new bits.
When loading to a board that does not already have MicroPython loaded, you should erase the entire flash before flashing the MicroPython firmware.
(myproject) $ esptool.py -p /dev/ttyUSB0 erase_flash esptool.py v2.1 Connecting.... Detecting chip type... ESP32 Chip is ESP32D0WDQ6 (revision 1) Uploading stub... Running stub... Stub running... Erasing flash (this may take a while)... Chip erase completed successfully in 5.0s Hard resetting...
Now load the firmware with the write_flash command
The general form is:
esptool.py write_flash -p <port> -z <address> <filename>
-p specify the port
<port> the port to use i.e. /dev/ttyUSB0
-z Compress data in transfer (default unless --no-stub is
specified)
<address> <filename> Address followed by binary filename, separated by
space
(myproject) $ esptool.py -p /dev/ttyUSB0 write_flash -z 0x1000 images/esp32-20170916-v1.9.2-272-g0d183d7f.bin esptool.py v2.1 Connecting.... Detecting chip type... ESP32 Chip is ESP32D0WDQ6 (revision 1) Uploading stub... Running stub... Stub running... Configuring flash size... Auto-detected Flash size: 4MB Compressed 902704 bytes to 566927... Wrote 902704 bytes (566927 compressed) at 0x00001000 in 50.0 seconds (effective 144.4 kbit/s)... Hash of data verified. Leaving... Hard resetting...
Verify the firmware loaded correctly
(myproject) $ miniterm.py --raw /dev/ttyUSB0 115200 --- Miniterm on /dev/ttyUSB0 115200,8,N,1 --- --- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H --- >>>
Now do a hard reset using the reset button on the board
>>> ets Jun 8 2016 00:22:57 rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) flash read err, 1000 ets_main.c 371 ets Jun 8 2016 00:22:57 rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:2 load:0x3fff0010,len:4 load:0x3fff0014,len:4268 load:0x40078000,len:0 load:0x40078000,len:10648 entry 0x4007a56c I (982) cpu_start: Pro cpu up. I (983) cpu_start: Single core mode I (984) heap_init: Initializing. RAM available for dynamic allocation: I (994) heap_init: At 3FFAE2A0 len 00001D60 (7 KiB): DRAM I (1013) heap_init: At 3FFD4158 len 0000BEA8 (47 KiB): DRAM I (1032) heap_init: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM I (1052) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM I (1072) heap_init: At 4008F3A8 len 00010C58 (67 KiB): IRAM I (1091) cpu_start: Pro cpu start user code I (1152) cpu_start: Starting scheduler on PRO CPU. OSError: [Errno 2] ENOENT MicroPython v1.9.2-272-g0d183d7f on 2017-09-16; ESP32 module with ESP32 Type "help()" for more information. >>>
You should verify that the firmware specified in the banner after the reset matches that firmware that you just loaded. In this case, v1.9.2-272-g0d183d7f
May the Zen of Python be with you.
September 26, 2017
Python Data
Text Analytics with Python – A book review
This is a book review of Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data by Dipanjan Sarkar
One of my go-to books for natural language processing with Python has been Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper. This has been the book for me and was one of my dissertation references. I used this book so much, that I I had to buy a second copy of this book because I wore the first one out. I’ve read many other NLP books but haven’t found any that could match this book – till now.
Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data by Dipanjan Sarkar is a fantastic book and has now taken a permanent place on my bookshelf.
Unlike many books that I run across, this book spends plenty of time talking about the theory behind things rather than just doing some hand-waving and then showing some code. In fact, there isn’t any code (that I saw) until page 41. That’s impressive these days. Here’s a quick overview of the book’s layout:
- Chapter 1 provides the baseline for Natural Language. This is a very good overview for anyone that’s never worked much with NLP.
- Chapter 2 is a python ‘refresher’. If you don’t know python at all but know some other language, this should get you started enough to use the rest of the book.
- Chapter’s 3 – 7 is there the real fun begins. These chapters cover Text Classification, Summarization Similarity / Clustering and Semantic / Sentiment Analysis.
If you have some familiarity with python and NLP, you can jump to Chapter 3 and dive into the details.
What I really like about this book is that it places theory first. I’m a big fan of ‘learning by doing’ but I think before you can ‘do’ you need to know ‘why’ you are doing what you are doing. The code in the book is really well done as well and uses the NLTK, Sklearn and gensim libraries for most of the work. Additionally, there are multiple ‘build your own’ sections where the author provides a very good overview (and walk-through) of what it takes to build your own functionality for your own NLP work.
This book is highly recommended.
Links in this post:
Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper.
Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data by Dipanjan Sarkar
The post Text Analytics with Python – A book review appeared first on Python Data.
Weekly Python Chat
Linting Code
Code linting: it's not about fuzzy stuff, it's about readability. Let's chat about this.
Stack Abuse
Reading and Writing CSV Files in Python
What is a CSV File?
A CSV (Comma Separated Values) file is a file that uses a certain formatting for storing data. This file format organizes information, containing one record per line, with each field (column) separated by a delimiter. The delimiter most commonly used is usually a comma.
This format is so common that it has actually been standardized in the RFC 4180. However, this standard isn't always followed and there is a lack of universal standard usage. The exact format used can sometime depend on the application it's being used for.
CSV files are commonly used because they're easy to read and manage, they're small in size, and fast to process/transfer. Because of these benefits, they are frequently used in software applications, ranging anywhere from online e-commerce stores to mobile apps to desktop tools. For example, Magento, an e-commerce platform, is known for its support of CSV.
In addition, many applications, such as Microsoft Excel, Notepad, and Google Docs, can be used to import or export CSV files.
The csv Python Module
The csv module implements classes to operate with CSV files. It is focused on the format that is preferred by Microsoft Excel. However, its functionality is extensive enough to work with CSV files that use different delimiters and quoting characters.
This module provides the functions reader and writer, which work in a sequential manner. It also has the DictReader and DictWriter classes to manage your CSV data in the form of a Python dictionary object.
csv.reader
The csv.reader(csvfile, dialect='excel', **fmtparams) method can be used to extract data from a file that contains CSV-formatted data.
It takes the following parameters:
csvfile: An object that supports the iterator protocol, which in this case is usually a file object for the CSV filedialect(optional): The name of the dialect to use (which will be explained in later sections)fmtparams(optional): Formatting parameters that will overwrite those specified in the dialect
This method returns a reader object, which can be iterated over to retrieve the lines of your CSV. The data is read as a list of strings. If we specify the QUOTE_NONNUMERIC format, non-quoted values are converted into float values.
An example on how to use this method is given in the Reading CSV Files section of this article.
csv.writer
The csv.writer(csvfile, dialect='excel', **fmtparams) method, which is similar to the reader method we described above, is a method that permits us to write data to a file in CSV format.
This method takes the following parameters:
csvfile: Any object with awrite()method, which in this case is usually a file objectdialect(optional): The name of the dialect to usefmtparams(optional):Formatting parameters that will overwrite those specified in the dialect
A note of caution with this method: If the csvfile parameter specified is a file object, it needs to have been opened it with newline=''. If this is not specified, newlines inside quoted fields will not be interpreted correctly, and depending on the working platform, extra characters, such as '\r' may be added.
csv.DictReader and csv.DictWriter
The csv module also provides us the DictReader and DictWriter classes, which allow us to read and write to files using dictionary objects.
The class DictReader() works in a similar manner as a csv.reader, but in Python 2 it maps the data to a dictionary and in Python 3 it maps data to an OrderedDict. The keys are given by the field-names parameter.
And just like DictReader, the class DictWriter() works very similarly to the csv.writer method, although it maps the dictionary to output rows. However, be aware that since Python's dictionaries are not ordered, we cannot predict the row order in the output file.
Both of these classes includes an optional parameter to use dialects.
Dialects
A dialect, in the context of reading and writing CSVs, is a construct that allows you to create, store, and re-use various formatting parameters for your data.
Python offers two different ways to specify formatting parameters. The first is by declaring a subclass of this class, which contains the specific attributes. The second is by directly specifying the formatting parameters, using the same names as defined in the Dialect class.
Dialect supports several attributes. The most frequently used are:
Dialect.delimiter: Used as the separating character between fields. The default value is a comma (,).Dialect.quotechar: Used to quote fields containing special characters. The default is the double-quote (").Dialect.lineterminator: Used to create newlines. The default is '\r\n'.
Use this class to tell the csv module how to interact with your non-standard CSV data.
Versions
One important thing to note if you're using Python 2.7: it isn't as easy to support Unicode input in this version of Python, so you may need to ensure all of your input is in UTF-8 or printable ASCII characters.
A CSV File Example
We can create a CSV file easily with a text editor or even Excel. In the example below, the Excel file has a combination of numbers (1, 2 and 3) and words (Good morning, Good afternoon, Good evening), each of them in a different cell.

To save this file as a CSV, click File->Save As, then in the Save As window, select "Comma Separated Values (.csv)" under the Format dropdown. Save it as csvexample.csv for later use.
The structure of the CSV file can be seen using a text editor, such as Notepad or Sublime Text. Here, we can get the same values as in the Excel file, but separated by commas.
1,2,3
Good morning,Good afternoon,Good evening
We will use this file in the following examples.
We can also change the delimiter to something other than a comma, like a forward slash ('/'). Make this change in the file above, replacing all of the commas with forward slashes, and save it as csvexample2.csv for later use. It will look as follows:
1/2/3
Good morning/Good afternoon/Good evening
This is also valid CSV data, as long as we use the correct dialect and formatting to read/write the data, which in this case would require a '/' delimiter.
Reading CSV Files
A Simple CSV File
In this example we are going to show how you can read the csvexample.csv file, which we created and explained in a previous section. The code is as follows:
import csv
with open('csvexample.csv', newline='') as myFile:
reader = csv.reader(myFile)
for row in reader:
print(row)
In this code we open our CSV file as myFile and then use the csv.reader method to extract the data in to the reader object, which we can then iterate over to retrieve each line of our data. For this example, to show that the data was actually read, we just print it to the console.
If we save the code in a file named reader.py and we run it, the result should show the following:
$ python reader.py
['1', '2', '3']
['Good morning', 'Good afternoon', 'Good evening']
As we can see from running this code, we obtain the contents of the csvexample.csv file, which are printed to the console, except that now it is in a structured form that we can more easily work with in our code.
Changing the Delimiter
The csv module allows us to read CSV files, even when some of the file format characteristics are different from the standard formatting. For example, we can read a file with a different delimiter, like tabs, periods, or even spaces (any character, really). In our other example, csvexample2.csv, we have replaced the comma with a forward slash to demonstrate this.
In order to perform the same task as above with this new formatting, we must modify the code to indicate the new delimiter being used. In this example, we have saved the code in a file named reader2.py. The modified program is a follows:
import csv
with open('csvexample2.csv', newline='') as myFile:
reader = csv.reader(myFile, delimiter='/', quoting=csv.QUOTE_NONE)
for row in reader:
print(row)
As we can see from the code above, we have modified the third line of code by adding the delimiter parameter and assigning a value of '/' to it. This tells the method to treat all '/' characters as the separating point between column data.
We have also added the quoting parameter, and assigned it a value of csv.QUOTE_NONE, which means that the method should not use any special quoting while parsing. As expected, the result is similar to the previous example:
$ python reader2.py
['1', '2', '3']
['Good morning', 'Good afternoon', 'Good evening']
As you can see, thanks to the small changes in the code we still get the same expected result.
Creating a Dialect
The csv module allows us to create a dialect with the specific characteristics of our CSV file. Thus, the same result from above can also be achieved with the following code:
import csv
csv.register_dialect('myDialect', delimiter='/', quoting=csv.QUOTE_NONE)
with open('csvexample2.csv', newline='') as myFile:
reader = csv.reader(myFile, dialect='myDialect')
for row in reader:
print(row)
Here we create and register our own named dialect, which in this case uses the same formatting parameters as before (forward slashes and no quoting). We then specify to csv.reader that we want to use the dialect we registered by passing its name as the dialect parameter.
If we save this code in a file named reader3.py and run it, the result will be as follows:
$ python reader3.py
['1', '2', '3']
['Good morning', 'Good afternoon', 'Good evening']
Again, this output is exactly the same as above, which means we correctly parsed the non-standard CSV data.
Writing to CSV Files
Just like reading CSVs, the csv module appropriately provides plenty of functionality to write data to a CSV file as well. The writer object presents two functions, namely writerow() and writerows(). The difference between them, as you can probably tell from the names, is that the first function will only write one row, and the function writerows() writes several rows at once.
The code in the example below creates a list of data, with each element in the outer list representing a row in the CSV file. Then, our code opens a CSV file named csvexample3.csv, creates a writer object, and writes our data to the file using the writerows() method.
import csv
myData = [[1, 2, 3], ['Good Morning', 'Good Evening', 'Good Afternoon']]
myFile = open('csvexample3.csv', 'w')
with myFile:
writer = csv.writer(myFile)
writer.writerows(myData)
The resulting file, csvexample3.csv, should have the following text:
1,2,3
Good Morning,Good Evening,Good Afternoon
The writer object also caters to other CSV formats as well. The following example creates and uses a dialect with '/' as delimiter:
import csv
myData = [[1, 2, 3], ['Good Morning', 'Good Evening', 'Good Afternoon']]
csv.register_dialect('myDialect', delimiter='/', quoting=csv.QUOTE_NONE)
myFile = open('csvexample4.csv', 'w')
with myFile:
writer = csv.writer(myFile, dialect='myDialect')
writer.writerows(myData)
Similar to our "reading" example, we create a dialect in the same way (via csv.register_dialect()) and use it in the same way, by specifying it by name.
And again, running the code above results in the following output to our new csvexample4.csv file:
1/2/3
Good Morning/Good Evening/Good Afternoon
Using Dictionaries
In many cases, our data won't be formatted as a 2D array (as we saw in the previous examples), and it would be nice if we had better control over the data we read. To help with this problem, the csv module provides helper classes that lets us read/write our CSV data to/from dictionary objects, which makes the data much easier to work with.
Interacting with your data in this way is much more natural for most Python applications and will be easier to integrate in to your code thanks to the familiarity of dict.
Reading a CSV File with DictReader
Using your favorite text editor, create a CSV file named countries.csv with the following content:
country,capital
France,Paris
Italy,Rome
Spain,Madrid
Russia,Moscow
Now, the format of this data might look a little bit different than our examples before. The first row in this file contains the field/column names, which provides a label for each column of data. The rows in this file contain pairs of values (country, capital) separated by a comma. These labels are optional, but tend to be very helpful, especially when you have to actually look this data yourself.
In order to read this file, we create the following code:
import csv
with open('countries.csv') as myFile:
reader = csv.DictReader(myFile)
for row in reader:
print(row['country'])
We still loop through each row of the data, but notice how we can now access each row's columns by their label, which in this case is the country. If we wanted, we could also access the capital with row['capital'].
Running the code results in the following:
$ python readerDict.py
France
Italy
Spain
Russia
Writing to a File with DictWriter
We can also create a CSV file using our dictionaries. In the code below, we create a dictionary with the country and capital fields. Then we create a writer object that writes data to our countries.csv file, which has the set of fields previously defined with the list myFields.
Following that, we first write the header row with the writeheader() method, and then the pairs of values using the writerow() method. Each value's position in the row is specified using the column label. You can probably imagine how useful this becomes when you have tens or even hundreds of columns in your CSV data.
import csv
myFile = open('countries.csv', 'w')
with myFile:
myFields = ['country', 'capital']
writer = csv.DictWriter(myFile, fieldnames=myFields)
writer.writeheader()
writer.writerow({'country' : 'France', 'capital': 'Paris'})
writer.writerow({'country' : 'Italy', 'capital': 'Rome'})
writer.writerow({'country' : 'Spain', 'capital': 'Madrid'})
writer.writerow({'country' : 'Russia', 'capital': 'Moscow'})
And finally, running this code gives us the correct CSV output, with labels and all:
country,capital
France,Paris
Italy,Rome
Spain,Madrid
Russia,Moscow
Conclusion
CSV files are a handy file storage format that many developers use in their projects. They're are small, easy to manage, and widely used throughout software development. Lucky for you, Python has a dedicated module for them that provides flexible methods and classes for managing CSV files in a straightforward and efficient manner.
In this article we showed you how to use the csv Python module to both read and write CSV data to a file. In addition to this, we also showed how to create dialects, and use helper classes like DictReader and DictWriter to read and write CSVs from/to dict objects.
Continuum Analytics Blog
Anaconda and Microsoft Partner to Deliver Python-Powered Machine Learning
Strata Data Conference, NEW YORK––September 26, 2017––Anaconda, Inc., the most popular Python data science platform provider, today announced it is partnering with Microsoft to embed Anaconda into Azure Machine Learning, Visual Studio and SQL Server to deliver data insights in real time.
hypothesis.works articles
When multiple bugs attack
When Hypothesis finds an example triggering a bug, it tries to shrink the example down to something simpler that triggers it. This is a pretty common feature, and most property-based testing libraries implement something similar (though there are a number of differences between them). Stand-alone test case reducers are also fairly common, as it’s a useful thing to be able to do when reporting bugs in external projects - rather than submitting a giant file triggering the bug, a good test case reducer can often shrink it down to a couple of lines.
But there’s a problem with doing this: How do you know that the bug you started with is the same as the bug you ended up with?
This isn’t just an academic question. It’s very common for the bug you started with to slip to another one.
Consider for example, the following test:
from hypothesis import given, strategies as st
def mean(ls):
return sum(ls) / len(ls)
@given(st.lists(st.floats()))
def test(ls):
assert min(ls) <= mean(ls) <= max(ls)
This has a number of interesting ways to fail: We could pass NaN, we could
pass [-float('inf'), +float('inf')], we could pass numbers which trigger a
precision error, etc.
But after test case reduction, we’ll pass the empty list and it will fail because we tried to take the min of an empty sequence.
This isn’t necessarily a huge problem - we’re still finding a bug after all (though in this case as much in the test as in the code under test) - and sometimes it’s even desirable - you find more bugs this way, and sometimes they’re ones that Hypothesis would have missed - but often it’s not, and an interesting and rare bug slips to a boring and common one.
Historically Hypothesis has had a better answer to this than most - because of the Hypothesis example database, all intermediate bugs are saved and a selection of them will be replayed when you rerun the test. So if you fix one bug then rerun the test, you’ll find the other bugs that were previously being hidden from you by that simpler bug.
But that’s still not a great user experience - it means that you’re not getting nearly as much information as you could be, and you’re fixing bugs in Hypothesis’s priority order rather than yours. Wouldn’t it be better if Hypothesis just told you about all of the bugs it found and you could prioritise them yourself?
Well, as of Hypothesis 3.29.0, released a few weeks ago, now it does!
If you run the above test now, you’ll get the following:
Falsifying example: test(ls=[nan])
Traceback (most recent call last):
File "/home/david/hypothesis-python/src/hypothesis/core.py", line 671, in run
print_example=True, is_final=True
File "/home/david/hypothesis-python/src/hypothesis/executors.py", line 58, in default_new_style_executor
return function(data)
File "/home/david/hypothesis-python/src/hypothesis/core.py", line 120, in run
return test(*args, **kwargs)
File "broken.py", line 8, in test
def test(ls):
File "/home/david/hypothesis-python/src/hypothesis/core.py", line 531, in timed_test
result = test(*args, **kwargs)
File "broken.py", line 9, in test
assert min(ls) <= mean(ls) <= max(ls)
AssertionError
Falsifying example: test(ls=[])
Traceback (most recent call last):
File "/home/david/hypothesis-python/src/hypothesis/core.py", line 671, in run
print_example=True, is_final=True
File "/home/david/hypothesis-python/src/hypothesis/executors.py", line 58, in default_new_style_executor
return function(data)
File "/home/david/hypothesis-python/src/hypothesis/core.py", line 120, in run
return test(*args, **kwargs)
File "broken.py", line 8, in test
def test(ls):
File "/home/david/hypothesis-python/src/hypothesis/core.py", line 531, in timed_test
result = test(*args, **kwargs)
File "broken.py", line 9, in test
assert min(ls) <= mean(ls) <= max(ls)
ValueError: min() arg is an empty sequence
You can add @seed(67388524433957857561882369659879357765) to this test to reproduce this failure.
Traceback (most recent call last):
File "broken.py", line 12, in <module>
test()
File "broken.py", line 8, in test
def test(ls):
File "/home/david/hypothesis-python/src/hypothesis/core.py", line 815, in wrapped_test
state.run()
File "/home/david/hypothesis-python/src/hypothesis/core.py", line 732, in run
len(self.falsifying_examples,)))
hypothesis.errors.MultipleFailures: Hypothesis found 2 distinct failures.
(The stack traces are a bit noisy, I know. We have an issue open about cleaning them up).
All of the different bugs are minimized simultaneously and take full advantage of Hypothesis’s example shrinking, so each bug is as easy (or hard) to read as if it were the only bug we’d found.
This isn’t perfect: The heuristic we use for determining if two bugs are the same is whether they
have the same exception type and the exception is thrown from the same line. This will necessarily
conflate some bugs that are actually different - for example, [float('nan')],
[-float('inf'), float('inf')] and [3002399751580415.0, 3002399751580415.0, 3002399751580415.0]
each trigger the assertion in the test, but they are arguably “different” bugs.
But that’s OK. The heuristic is deliberately conservative - the point is not that it can distinguish whether any two examples are the same bug, just that any two examples it distinguishes are different enough that it’s interesting to show both, and this heuristic definitely manages that.
As far as I know this is a first in property-based testing libraries (though something like it is common in fuzzing tools, and theft is hot on our tail with something similar) and there’s been some interesting related but mostly orthogonal research in Erlang QuickCheck.
It was also surprisingly easy.
A lot of things went right in writing this feature, some of them technical, some of them social, somewhere in between.
The technical ones are fairly straightforward: Hypothesis’s core model turned out to be very well suited to this feature. Because Hypothesis has a single unified intermediate representation which defines a total ordering for simplicity, adapting Hypothesis to shrink multiple things at once was quite easy - whenever we attempt a shrink and it produces a different bug than the one we were looking for, we compare it to our existing best example for that bug and replace it if the current one is better (or we’ve discovered a new bug). We then just repeatedly run the shrinking process for each bug we know about until they’ve all been fully shrunk.
This is in a sense not surprising - I’ve been thinking about the problem of multiple-shrinking for a long time and, while this is the first time it’s actually appeared in Hypothesis, the current choice of model was very much informed by it.
The social ones are perhaps more interesting. Certainly I’m very pleased with how they turned out here.
The first is that this work emerged tangentially from the recent Stripe funded work - Stripe paid me to develop some initial support for testing Pandas code with Hypothesis, and I observed a bunch of bug slippage happening in the wild while I was testing that (it turns out there are quite a lot of ways to trigger exceptions from Pandas - they weren’t really Pandas bugs so much as bugs in the Pandas integration, but they still slipped between several different exception types), so that was what got me thinking about this problem again.
Not by accident, this feature also greatly simplified the implementation of the new deadline feature that Smarkets funded, which was going to have to have a lot of logic about how deadlines and bugs interacted, but all that went away as soon as we were able to handle multiple bugs sensibly.
This has been a relatively consistent theme in Hypothesis development - practical problems tend to spark related interesting theoretical developments. It’s not a huge exaggeration to say that the fundamental Hypothesis model exists because I wanted to support testing Django nicely. So the recent funded development from Stripe and Smarkets has been a great way to spark a lot of seemingly unrelated development and improve Hypothesis for everyone, even outside the scope of the funded work.
Another thing that really helped here is our review process, and the review from Zac in particular.
This wasn’t the feature I originally set out to develop. It started out life as a much simpler feature that used much of the same machinery, and just had a goal of avoiding slipping to new errors all together. Zac pushed back with some good questions around whether this was really the correct thing to do, and after some experimentation and feedback I eventually hit on the design that lead to displaying all of the errors.
Our review handbook emphasises that code review is a collaborative design process, and I feel this was a particularly good example of that. We’ve created a great culture of code review, and we’re reaping the benefits (and if you want to get in on it, we could always use more people able and willing to do review…).
All told, I’m really pleased with how this turned out. I think it’s a nice example of getting a lot of things right up front and this resulting in a really cool new feature.
I’m looking forward to seeing how it behaves in the wild. If you notice any particularly fun examples, do let me know, or write up a post about them yourself!
Read more...Python Software Foundation
Join the Python Developers Survey 2017: Share and learn about the community
2017 is drawing to a close and we are super-excited to start the official Python Developers Survey 2017!
We’ve created this survey specially for Python developers who use it as their primary or supplementary language. We expect the survey findings to help us map an accurate landscape of the Python developer community and to provide insight into the current major trends in the Python community.
Your valuable opinion and feedback will help us better understand how different Python developers use Python and related frameworks, tools and technologies. We also hope you'll have fun going through the questions.
The survey is organized in partnership between the Python Software Foundation and JetBrains. After the survey is over, we will publish the aggregated results and randomly choose 100 winners (from those who complete the survey in its entirety), who will each receive an amazing Python Surprise Gift Pack.
Simple is Better Than Complex
How to Create Django Data Migrations
Data Migration is a very convenient way to change the data in the database in conjunction with changes in the schema. They work like a regular schema migration. Django keep track of dependencies, order of execution and if the application already applied a given data migration or not.
A common use case of data migrations is when we need to introduce new fields that are non-nullable. Or when we are creating a new field to store a cached count of something, so we can create the new field and add the initial count.
In this post we are going to explore a simple example that you can very easily extend and modify for your needs.
Data Migrations
Let’s suppose we have an app named blog, which is installed in our project’s INSTALLED_APPS.
The blog have the following model definition:
blog/models.py
from django.db import models
class Post(models.Model):
title = models.CharField(max_length=255)
date = models.DateTimeField(auto_now_add=True)
content = models.TextField()
def __str__(self):
return self.title
The application is already using this Post model; it’s already in production and there are plenty of data stored in the database.
| id | title | date | content |
|---|---|---|---|
| 1 | How to Render Django Form Manually | 2017-09-26 11:01:20.547000 | […] |
| 2 | How to Use Celery and RabbitMQ with Django | 2017-09-26 11:01:39.251000 | […] |
| 3 | How to Setup Amazon S3 in a Django Project | 2017-09-26 11:01:49.669000 | […] |
| 4 | How to Configure Mailgun To Send Emails in a Django Project | 2017-09-26 11:02:00.131000 | […] |
Now let’s say we want to introduce a new field named slug which will be used to compose the new URLs of the blog. The slug field must be unique and not null.
Generally speaking, always add new fields either as null=True or with a default value. If we can’t solve the
problem with the default parameter, first create the field as null=True then create a data migration for it. After
that we can then create a new migration to set the field as null=False.
Here is how we can do it:
blog/models.py
from django.db import models
class Post(models.Model):
title = models.CharField(max_length=255)
date = models.DateTimeField(auto_now_add=True)
content = models.TextField()
slug = models.SlugField(null=True)
def __str__(self):
return self.title
Create the migration:
python manage.py makemigrations blog
Migrations for 'blog':
blog/migrations/0002_post_slug.py
- Add field slug to post
Apply it:
python manage.py migrate blog
Operations to perform:
Apply all migrations: blog
Running migrations:
Applying blog.0002_post_slug... OK
At this point, the database already have the slug column.
| id | title | date | content | slug |
|---|---|---|---|---|
| 1 | How to Render Django Form Manually | 2017-09-26 11:01:20.547000 | […] | (null) |
| 2 | How to Use Celery and RabbitMQ with Django | 2017-09-26 11:01:39.251000 | […] | (null) |
| 3 | How to Setup Amazon S3 in a Django Project | 2017-09-26 11:01:49.669000 | […] | (null) |
| 4 | How to Configure Mailgun To Send Emails in a Django Project | 2017-09-26 11:02:00.131000 | […] | (null) |
Create an empty migration with the following command:
python manage.py makemigrations blog --empty
Migrations for 'blog':
blog/migrations/0003_auto_20170926_1105.py
Now open the file 0003_auto_20170926_1105.py, and it should have the following contents:
blog/migrations/0003_auto_20170926_1105.py
# -*- coding: utf-8 -*-
# Generated by Django 1.11.5 on 2017-09-26 11:05
from __future__ import unicode_literals
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
('blog', '0002_post_slug'),
]
operations = [
]
Then here in this file, we can create a function that can be executed by the RunPython command:
blog/migrations/0003_auto_20170926_1105.py
# -*- coding: utf-8 -*-
# Generated by Django 1.11.5 on 2017-09-26 11:05
from __future__ import unicode_literals
from django.db import migrations
from django.utils.text import slugify
def slugify_title(apps, schema_editor):
'''
We can't import the Post model directly as it may be a newer
version than this migration expects. We use the historical version.
'''
Post = apps.get_model('blog', 'Post')
for post in Post.objects.all():
post.slug = slugify(post.title)
post.save()
class Migration(migrations.Migration):
dependencies = [
('blog', '0002_post_slug'),
]
operations = [
migrations.RunPython(slugify_title),
]
In the example above we are using the slugify utility function. It takes a string as parameter and transform it in
a slug. See below some examples:
from django.utils.text import slugify
slugify('Hello, World!')
'hello-world'
slugify('How to Extend the Django User Model')
'how-to-extend-the-django-user-model'
Anyway, the function used by the RunPython method to create a data migration, expects two parameters: apps and
schema_editor. The RunPython will feed those parameters. Also remember to import models using the
apps.get_model('app_name', 'model_name') method.
Save the file and execute the migration as you would do with a regular model migration:
python manage.py migrate blog
Operations to perform:
Apply all migrations: blog
Running migrations:
Applying blog.0003_auto_20170926_1105... OK
Now if we check the database:
| id | title | date | content | slug |
|---|---|---|---|---|
| 1 | How to Render Django Form Manually | 2017-09-26 11:01:20.547000 | […] | how-to-render-django-form-manually |
| 2 | How to Use Celery and RabbitMQ with Django | 2017-09-26 11:01:39.251000 | […] | how-to-use-celery-and-rabbitmq-with-django |
| 3 | How to Setup Amazon S3 in a Django Project | 2017-09-26 11:01:49.669000 | […] | how-to-setup-amazon-s3-in-a-django-project |
| 4 | How to Configure Mailgun To Send Emails in a Django Project | 2017-09-26 11:02:00.131000 | […] | how-to-configure-mailgun-to-send-emails-in-a-django-project |
Every Post entry have a value, so we can safely change the switch from null=True to null=False. And since all
the values are unique, we can also add the unique=True flag.
Change the model:
blog/models.py
from django.db import models
class Post(models.Model):
title = models.CharField(max_length=255)
date = models.DateTimeField(auto_now_add=True)
content = models.TextField()
slug = models.SlugField(null=False, unique=True)
def __str__(self):
return self.title
Create a new migration:
python manage.py makemigrations blog
This time you will see the following prompt:
You are trying to change the nullable field 'slug' on post to non-nullable without a default; we can't do that
(the database needs something to populate existing rows).
Please select a fix:
1) Provide a one-off default now (will be set on all existing rows with a null value for this column)
2) Ignore for now, and let me handle existing rows with NULL myself (e.g. because you added a RunPython or RunSQL
operation to handle NULL values in a previous data migration)
3) Quit, and let me add a default in models.py
Select an option:
Select option 2 by typing “2” in the terminal.
Migrations for 'blog':
blog/migrations/0004_auto_20170926_1422.py
- Alter field slug on post
Now we can safely apply the migration:
python manage.py migrate blog
Operations to perform:
Apply all migrations: blog
Running migrations:
Applying blog.0004_auto_20170926_1422... OK
Conclusions
Data migrations are tricky sometimes. When creating data migration for your projects, always examine the production data first. The implementation of the slugify_title I used in the example is a little naïve, because it could generate duplicate titles for a large dataset. Always test the data migrations first in a staging environment, so to avoid breaking things in production.
It’s also important to do it step-by-step, so you can feel in control of the changes you are introducing. Note that here I create three migration files for a simple data migration.
As you can see, it’s fairly easy to create this type of migration. It’s also very flexible. You could for example load an external text file to insert the data into a new column for example.
The source code used in this blog post is available on GitHub: https://github.com/sibtc/data-migrations-example
S. Lott
Learning About Data Science.
I work with data scientists. I am not a scientist.
eGenix.com
Python Meeting Düsseldorf - 2017-09-27
The following text is in German, since we're announcing a regional user group meeting in Düsseldorf, Germany.
Ankündigung
Das nächste Python Meeting Düsseldorf findet an folgendem Termin statt:
27.09.2017, 18:00 Uhr
Raum 1, 2.OG im Bürgerhaus Stadtteilzentrum Bilk
Düsseldorfer Arcaden, Bachstr. 145, 40217 Düsseldorf
Neuigkeiten
Bereits angemeldete Vorträge
Dr. Uwe Ziegenhagen
"Datenanalyse mit Python pandas"
Charlie Clark
"Typ-Systeme in Python"
Weitere Vorträge können gerne noch angemeldet werden. Bei Interesse, bitte unter info@pyddf.de melden.
Startzeit und Ort
Wir treffen uns um 18:00 Uhr im Bürgerhaus in den Düsseldorfer Arcaden.
Das Bürgerhaus teilt sich den Eingang mit dem Schwimmbad und befindet
sich an der Seite der Tiefgarageneinfahrt der Düsseldorfer Arcaden.
Über dem Eingang steht ein großes "Schwimm’ in Bilk" Logo. Hinter der Tür
direkt links zu den zwei Aufzügen, dann in den 2. Stock hochfahren. Der
Eingang zum Raum 1 liegt direkt links, wenn man aus dem Aufzug kommt.
>>> Eingang in Google Street View
Einleitung
Das Python Meeting Düsseldorf ist eine regelmäßige Veranstaltung in Düsseldorf, die sich an Python Begeisterte aus der Region wendet.
Einen guten Überblick über die Vorträge bietet unser PyDDF YouTube-Kanal, auf dem wir Videos der Vorträge nach den Meetings veröffentlichen.Veranstaltet wird das Meeting von der eGenix.com GmbH, Langenfeld, in Zusammenarbeit mit Clark Consulting & Research, Düsseldorf:
Programm
Das Python Meeting Düsseldorf nutzt eine Mischung aus (Lightning) Talks und offener Diskussion.
Vorträge können vorher angemeldet werden, oder auch spontan während des Treffens eingebracht werden. Ein Beamer mit XGA Auflösung steht zur Verfügung.(Lightning) Talk Anmeldung bitte formlos per EMail an info@pyddf.de
Kostenbeteiligung
Das Python Meeting Düsseldorf wird von Python Nutzern für Python Nutzer veranstaltet.
Da Tagungsraum, Beamer, Internet und Getränke Kosten produzieren, bitten wir die Teilnehmer um einen Beitrag in Höhe von EUR 10,00 inkl. 19% Mwst. Schüler und Studenten zahlen EUR 5,00 inkl. 19% Mwst.
Wir möchten alle Teilnehmer bitten, den Betrag in bar mitzubringen.
Anmeldung
Da wir nur für ca. 20 Personen Sitzplätze haben, möchten wir bitten,
sich per EMail anzumelden. Damit wird keine Verpflichtung eingegangen.
Es erleichtert uns allerdings die Planung.
Meeting Anmeldung bitte formlos per EMail an info@pyddf.de
Weitere Informationen
Weitere Informationen finden Sie auf der Webseite des Meetings:
http://pyddf.de/
Viel Spaß !
Marc-Andre Lemburg, eGenix.com



