skip to navigation
skip to content

Planet Python

Last update: June 29, 2017 10:49 AM

June 29, 2017

Curtis Miller

Stock Trading Analytics and Optimization in Python with PyFolio, R’s PerformanceAnalytics, and backtrader

Introduction Having figured out how to perform walk-forward analysis in Python with backtrader, I want to have a look at evaluating a strategy’s performance. So far, I have cared about only one metric: the final value of the account at the end of a backtest relative. This should not be the only metric considered. Most…Read more Stock Trading Analytics and Optimization in Python with PyFolio, R’s PerformanceAnalytics, and backtrader

June 29, 2017 04:12 AM

June 28, 2017

Mike Driscoll

Meta: The new Mouse Vs Python Newsletter

I recently decided to try giving my readers the option of signing up for a weekly round up of the articles that I publish to this blog. I added it to my Follow the Blog page, but if you’re interested in getting an email once a week that includes links to all the articles from the past week, you can also sign up below:

#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; } /* Add your own MailChimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */

Subscribe to a weekly email of the blog

* indicates required

I will note that this is a bit experimental for me and I am currently attempting to get the emails formatted correctly. I believe I finally have something that looks right, but there may be some minor changes that happen over the next couple of weeks as I learn the platform.

June 28, 2017 10:15 PM

wxPython – Getting Data From All Columns in a ListCtrl

Every now and then, I see someone asking how to get the text for each item in a row of a ListCtrl in report mode. The ListCtrl does not make it very obvious how you would get the text in row one, column three for example. In this article we will look at how we might accomplish this task.

Getting Data from Any Column

Let’s start by creating a simple ListCtrl and using a button to populate it. Then we’ll add a second button for extracting the contents of the ListCtrl:

import wx
class MyForm(wx.Frame):
    def __init__(self):
        wx.Frame.__init__(self, None, wx.ID_ANY, "List Control Tutorial")
        # Add a panel so it looks the correct on all platforms
        panel = wx.Panel(self, wx.ID_ANY)
        self.index = 0
        self.list_ctrl = wx.ListCtrl(panel, size=(-1,100),
        self.list_ctrl.InsertColumn(0, 'Subject')
        self.list_ctrl.InsertColumn(1, 'Due')
        self.list_ctrl.InsertColumn(2, 'Location', width=125)
        btn = wx.Button(panel, label="Add Line")
        btn2 = wx.Button(panel, label="Get Data")
        btn.Bind(wx.EVT_BUTTON, self.add_line)
        btn2.Bind(wx.EVT_BUTTON, self.get_data)
        sizer = wx.BoxSizer(wx.VERTICAL)
        sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 5)
        sizer.Add(btn, 0, wx.ALL|wx.CENTER, 5)
        sizer.Add(btn2, 0, wx.ALL|wx.CENTER, 5)
    def add_line(self, event):
        line = "Line %s" % self.index
        self.list_ctrl.InsertStringItem(self.index, line)
        self.list_ctrl.SetStringItem(self.index, 1, "01/19/2010")
        self.list_ctrl.SetStringItem(self.index, 2, "USA")
        self.index += 1
    def get_data(self, event):
        count = self.list_ctrl.GetItemCount()
        cols = self.list_ctrl.GetColumnCount()
        for row in range(count):
            for col in range(cols):
                item = self.list_ctrl.GetItem(itemId=row, col=col)
# Run the program
if __name__ == "__main__":
    app = wx.App(False)
    frame = MyForm()

Let’s take a moment to break this code down a bit. The first button’s event handler is the first piece of interesting code. It demonstrates how to insert data into the ListCtrl. As you can see, that’s pretty straightforward as all we need to do to add a row is call InsertStringItem and then set each column’s text using SetStringItem. There are other types of items that we can insert into a ListCtrl besides a String Item, but that’s outside the scope of this article.

Next we should take a look at the get_data event handler. It grabs the row count using the ListCtrl’s GetItemCount method. We also get the number of columns in the ListCtrl via GetColumnCount. Finally we loop over the rows and extract each cell, which in ListCtrl parlance is known as an “item”. We use the ListCtrl’s GetItem method of this task. Now that we have the item, we can call the item’s GetText method to extract the text and print it to stdout.

Associating Objects to Rows

An easier way to do this sort of thing would be to associate an object to each row. Let’s take a moment to see how this might be accomplished:

import wx
class Car(object):
    def __init__(self, make, model, year, color="Blue"):
        self.make = make
        self.model = model
        self.year = year
        self.color = color
class MyPanel(wx.Panel):
    def __init__(self, parent):
        wx.Panel.__init__(self, parent)
        rows = [Car("Ford", "Taurus", "1996"),
                Car("Nissan", "370Z", "2010"),
                Car("Porche", "911", "2009", "Red")
        self.list_ctrl = wx.ListCtrl(self, size=(-1,100),
        self.list_ctrl.Bind(wx.EVT_LIST_ITEM_SELECTED, self.onItemSelected)
        self.list_ctrl.InsertColumn(0, "Make")
        self.list_ctrl.InsertColumn(1, "Model")
        self.list_ctrl.InsertColumn(2, "Year")
        self.list_ctrl.InsertColumn(3, "Color")
        index = 0
        self.myRowDict = {}
        for row in rows:
            self.list_ctrl.InsertStringItem(index, row.make)
            self.list_ctrl.SetStringItem(index, 1, row.model)
            self.list_ctrl.SetStringItem(index, 2, row.year)
            self.list_ctrl.SetStringItem(index, 3, row.color)
            self.myRowDict[index] = row
            index += 1
        sizer = wx.BoxSizer(wx.VERTICAL)
        sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 5)
    def onItemSelected(self, event):
        currentItem = event.m_itemIndex
        car = self.myRowDict[currentItem]
class MyFrame(wx.Frame):
    def __init__(self):
        wx.Frame.__init__(self, None, wx.ID_ANY, "List Control Tutorial")
        panel = MyPanel(self)
if __name__ == "__main__":
    app = wx.App(False)
    frame = MyFrame()

In this example, we have a Car class that we will use to create Car object from. These Car objects will then be associated with a row in the ListCtrl. Take a look at MyPanel‘s __init__ method and you will see that we create a list of row objects and then loop over the row objects and insert them into the ListCtrl using the object’s attributes for the text values. You will also note that we have created a class attribute dictionary that use for associating the row’s index to the Car object that was inserted into the row.

We also bind the ListCtrl to EVT_LIST_ITEM_SELECTED so when an item is selected, it will call the onItemSelected method and print out the data from the row. You will note that we get the row’s index by using event.m_itemIndex. The rest of the code should be self-explanatory.

Wrapping Up

Now you know a couple of different approaches for extracting all the data from a ListCtrl. Personally, I really like using the ObjectListView widget. I feel that is superior to the ListCtrl as it has these kinds of features built-in. But it’s not included with wxPython so it’s an extra install.

Additional Reading

June 28, 2017 05:15 PM

Catalin George Festila

The Google API Client Library python module.

This python module named Google API Client Library for Python is a client library for accessing the Plus, Moderator, and many other Google APIs, according to the official link.

C:\Python27\Scripts>pip install --upgrade google-api-python-client
Collecting google-api-python-client
Downloading google_api_python_client-1.6.2-py2.py3-none-any.whl (52kB)
100% |################################| 61kB 426kB/s
Successfully installed google-api-python-client-1.6.2 ...
The example I used is this:
from oauth2client.client import flow_from_clientsecrets
import httplib2
import apiclient
from apiclient.discovery import build
from oauth2client.file import Storage
import webbrowser

def get_credentials():
scope = ''
flow = flow_from_clientsecrets(
'client_id.json', scope,
storage = Storage('credentials.dat')
credentials = storage.get()

if not credentials or credentials.invalid:
auth_uri = flow.step1_get_authorize_url()
auth_code = raw_input('Enter the auth code: ')
credentials = flow.step2_exchange(auth_code)
return credentials

def get_service():
"""Returns an authorised blogger api service."""
credentials = get_credentials()
http = httplib2.Http()
http = credentials.authorize(http)
service ='blogger', 'v3', http=http)
return service

if __name__ == '__main__':
served = get_service()
print dir(served.blogs)
users = served.users()

# Retrieve this user's profile information
thisuser = users.get(userId='self').execute()
print('This user\'s display name is: %s' % thisuser['displayName'].encode('utf-8'))

blogs = served.blogs()

# Retrieve the list of Blogs this user has write privileges on
thisusersblogs = blogs.listByUser(userId='self').execute()
for blog in thisusersblogs['items']:
print('The blog named \'%s\' is at: %s' % (blog['name'], blog['url']))
The result of this script is this:
['__call__', '__class__', '__cmp__', '__delattr__', '__doc__', '__format__', '__func__',
'__get__', '__getattribute__', '__hash__', '__init__', '__is_resource__', '__new__', 
'__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', 
'__str__', '__subclasshook__', 'im_class', 'im_func', 'im_self']
This user's display name is: Cătălin George Feștilă
The blog named 'python-catalin' is at:
The blog named 'graphics' is at:
The blog named 'About me and my life ...' is at:
The blog named 'pygame-catalin' is at:
About google settings then you need to have a google account to use Google’s API.
The first step for accessing the Google Developer’s Console.
Then navigate to the Developer Console’s projects page and create a new project for our application by clicking the Create project button and then enable blogger API.
Enter your projects name and hit create.
Click the Go to Credentials button with this settings like in the next image:

Download this credential information in JSON format in this case is the client_id.json file.
When you run for the first time this script you will see a open html page with your auth code.
The script example named will come with this message:
C:\Python27\lib\site-packages\oauth2client\ UserWarning: Cannot access credentials.dat: No such file or directory
Enter the auth code:
Put this auth code and allow the script using the open page and your google account using Allow button.
Now you can run the example.

June 28, 2017 04:37 PM

EuroPython Society

Invitation to the EuroPython Society General Assembly 2017

We would like to officially invite all EuroPython Society (EPS) members to attend this year’s EPS General Assembly (GA), which we will run as in-person meeting at the upcoming EuroPython 2017 Conference 2017, held in Rimini, Italy from July 9 - 16.

Place of the General Assembly meeting:

We will meet on Thursday 13 July, at 14:35 CEST in room PythonAnywhere of the EuroPython 2017 conference venue Palacongressi di Rimini (Via della Fiera 23, Rimini).

There will be a talk to invite volunteers to participate in organizing EuroPython 2018 in preparation for next year’s event at 14:00 CEST in the same room, right before the General Assembly. You may want to attend that talk as well. In this talk, we will present the EuroPython Workgroup Concept, we have been using successfully for three years now.

General Assembly Agenda

The agenda contents for the assembly is defined by the EPS bylaws. We are planning to use the following structure:

Election of the members of the board

The EPS bylaws limit the number of board members to one chair and 2 - 8 directors, at most 9 directors in total. Experience has shown that the board members are the most active organizers of the EuroPython conference, so we try to get as many board members as possible to spread the work load.

All members of the EPS are free to nominate or self nominate board members. Please write to no later than Friday, July 7 2017, if you want to run for board. We will then include you in the list we’ll have in the final nomination announcement before the GA, which is scheduled for July 7.

The following people from the current board have shown interest in running for board in the next term as well (in alphabetical order):

We will post more detailed information about the above candidates and any new nominations we receive in a separate blog post.

Propositions from the board

None at the moment.

The bylaws allow for additional propositions to be announced up until 5 days before the GA, so the above list is not necessarily the final list.

Motions from the members

None at the moment.

EPS members are entitled to suggest motions to be voted on at the GA. The bylaws require any such motions to be announced at least 5 days before the GA. If you would like to propose a motion, please send it to no later than Friday, July 7 2017.


EuroPython Society

June 28, 2017 03:12 PM


EuroPython 2017: On-desk Rates and Day Passes

We will be switching to the on-desk rates for tickets on Monday next week (July 3rd), so this is your last chance to get tickets at the regular rate, which is about 30% less than the on-desk rate.


On-desk Rates

We will have the following three categories of ticket prices for the on-desk full conference tickets (all 8 days):

Please note that we do not sell on-desk student tickets. Students who decide late will have to buy day passes or a personal ticket.

Day Passes

As in the past, we will also sell day passes at the conference venue.

Day passes for the conference (valid for the day when you pick up the badge):

Please see the registration page for full details of what is included in the ticket price.


EuroPython 2017 Team
EuroPython Society
EuroPython 2017 Conference

June 28, 2017 01:34 PM

Caktus Consulting Group

Managing your AWS Container Infrastructure with Python

We deploy Python/Django apps to a wide variety of hosting providers at Caktus. Our django-project-template includes a Salt configuration to set up an Ubuntu virtual machine on just about any hosting provider, from scratch. We've also modified this a number of times for local hosting requirements when our customer required the application we built to be hosted on hardware they control. In the past, we also built our own tool for creating and managing EC2 instances automatically via the Amazon Web Services (AWS) APIs. In March, my colleague Dan Poirier wrote an excellent post about deploying Django applications to Elastic Beanstalk demonstrating how we’ve used that service.

AWS have added many managed services that help ease the process of hosting web applications on AWS. The most important addition to the AWS stack (for us) was undoubtedly Amazon RDS for Postgres, launched in November 2013. As long-time advocates for Postgres, this addition to the AWS suite was the final puzzle piece necessary for building an AWS infrastructure for a typical Django app that requires little to no manual management. Still, the suite of AWS tools and services is immense, and configuring these manually is time-consuming and error-prone; despite everything it offers, setting up "one-click" deploys to AWS (à la Heroku) is still a complex challenge.

In this post, I'll be discussing another approach to hosting Python/Django apps and managing server infrastructure on AWS. In particular, we'll be looking at a Python library called troposphere that allows you to describe AWS resources using Python and generate CloudFormation templates to upload to AWS. We'll also look at a sample collection of troposphere scripts I compiled as part of the preparation for this post, which I've named (at least for now) AWS Container Basics.

Introduction to CloudFormation and Troposphere

CloudFormation is Amazon's answer to automated resource provisioning. A CloudFormation template is simply a JSON file that describes AWS resources and the relationships between them. It allows you to define Parameters (inputs) to the template and even includes a small set of intrinsic functions for more complex use cases. Relationships between resources are defined using the Ref function.

Troposphere allows you to accomplish all of the same things, but with the added benefit of writing Python code rather than JSON. To give you an idea of how Troposphere works, here's a quick example that creates an S3 bucket for hosting (public) static assets for your application (e.g., in the event you wanted to host your Django static media on S3):

from troposphere import Join, Template
from troposphere.s3 import (

template = Template()
domain_name = ""

                AllowedOrigins=[Join("", ["https://", domain_name])],
                AllowedMethods=["POST", "PUT", "HEAD", "GET"],


This generates a JSON dump that looks very similar to the corresponding Python code, which can be uploaded to CloudFormation to create and manage this S3 bucket. Why not just write this directly in JSON, one might ask? The advantages to using Troposphere are that:

  1. it gives you all the power of Python to describe or create resources conditionally (e.g., to easily provide multiple versions of the same template),
  2. it provides compile-time detection of naming or syntax errors, e.g., via flake8 or Python itself, and
  3. it also validates (most of) the structure of a template, e.g., ensuring that the correct object types are provided when creating a resource.

Troposphere does not detect all possible errors you might encounter when building a template for CloudFormation, but it does significantly improve one's ability to detect and fix errors quickly, without the need to upload the template to CloudFormation for a live test.

Supported resources

Creating an S3 bucket is a simple example, and you don't really need Troposphere to do that. How does this scale to larger, more complex infrastructure requirements?

As of the time of this post, Troposphere includes support for 39 different resource types (such as EC2, ECS, RDS, and Elastic Beanstalk). Perhaps most importantly, within its EC2 package, Troposphere includes support for creating VPCs, subnets, routes, and related network infrastructure. This means you can easily create a template for a VPC that is split across availability zones, and then programmatically define resources inside those subnets/VPCs. A stack for hosting an entire, self-contained application can be templated and easily duplicated for different application environments such as staging and production.

AWS managed services for a typical web app

AWS includes a wide array of managed services. Beyond EC2, what are some of the services one might need to host a Dockerized web application on AWS? Although each application is unique and will have differing managed service needs, some of the services one is likely to encounter when hosting a Python/Django (or any other) web application on AWS are:

  • S3 for storing and serving static and/or uploaded media
  • RDS for a Postgres (or MySQL) database
  • ElastiCache, which supports both Memcached and Redis, for a cache, session store, and/or message broker
  • CloudFront, which provides edge servers for faster serving of static resources
  • Certificate Manager, which provides a free SSL certificate for your AWS-provided load balancer and supports automatic renewal
  • Virtual Private Clouds (VPCs) for overall network management
  • Elastic Load Balancers (ELBs), which allow you to transparently spread traffic across Availability Zones (AZs). These are managed by AWS and the underlying IPs may change over time.

Provisioning your application servers

For hosting a Python/Django application on AWS, you have essentially four options:

  • Configure your application as a set of task definitions and/or services using the AWS Elastic Container Service (ECS). This is a complex service, and I don't recommend it as a starting point.
  • Create an Elastic Beanstalk Multicontainer Docker environment (which actually creates and manages an ECS Cluster for you behind the scenes). This provides much of the flexibility of ECS, but decouples the deployment and container definitions from the infrastructure. This makes it easier to set up your infrastructure once and be confident that you can continue to use it as your requirements for running additional tasks (e.g., background tasks via Celery) change over the lifetime of a project.
  • Configure an array of EC2 instances yourself, either by creating an AMI of your application or manually configuring EC2 instances with Salt, Ansible, Chef, Puppet, or another such tool. This is an option that facilitates migration for legacy applications that might already have all the tools in place to provision application servers, and it's typically fairly simple to modify these setups to point your application configuration to external database and cache servers. This is the only option available for projects using AWS GovCloud, which at the time of this post supports neither ECS nor EB.
  • Create an Elastic Beanstalk Python environment. This option is similar to configuring an array of EC2 instances yourself, but AWS manages provisioning the servers for you, based on the instructions you provide. This is the approach described in Dan's blog post on Amazon Elastic Beanstalk.

Putting it all together

This was originally a hobby / weekend learning project for me. I'm much indebted to the blog post by Jean-Philippe Serafin (no relation to Caktus) titled How to build a scalable AWS web app stack using ECS and CloudFormation, which I recommend reading to see how one can construct a comprehensive set of managed AWS resources in a single CloudFormation stack. Rather than repeat all of that here, however, I'm going to focus on some of the outcomes and potential uses for this project.

Jean-Philippe Serafin provided all the code for his blog post on GitHub. Starting from that, I've updated and released another project -- a workable solution for hosting fully-featured Python/Django apps, relying entirely on AWS managed services -- on GitHub under the name AWS Container Basics. It includes several configuration variants (thanks to Troposphere) that support stacks with and without NAT gateways as well as three of the application server hosting options outlined above (ECS, EB Multicontainer Docker, or EC2). Contributions are also welcome!

Setting up a demo

To learn more about how AWS works, I recommend creating a stack of your own to play with. You can do so for free if you have an account that's still within the 12-month free tier . If you don't have an account or it's past its free tier window, you can create a new account at (AWS does not frown on individuals or companies having multiple accounts, in fact, it's encouraged as an approach for keeping different applications or even environments properly isolated). Once you have an account ready:

  • Make sure you have your preferred region selected in the console via the menu in the top right corner. Sometimes AWS selects an unintuitive default, even after you have resources created in another region.

  • If you haven't already, you'll need to upload your SSH public key to EC2 (or create a new key pair). You can do so from the Key Pairs section of the EC2 Console.

  • Next, click the button below to launch a new stack:
  • On the Select Template page:

  • On the Specify Details page:

    • Enter a Stack Name of your choosing. Names that can be distinguished via the first 5 characters are better, because the name will be trimmed when generating names for the underlying AWS resources.
    • Change the instance types if you wish, however, note that the t2.micro instance type is available within the AWS free tier for EC2, RDS, and ElastiCache.
    • Enter a DatabaseEngineVersion. I recommend using the latest version of Postgres supported by RDS. As of the time of this post, that is 9.6.2
    • Generate and add a random DatabasePassword for RDS. While the stack is configured to pass this to your application automatically (via DATABASE_URL), RDS and CloudFormation do not support generating their own passwords at this time.
    • Enter a DomainName. This should be the fully-qualified domain name, e.g., Your email address (or one you have access to) should be listed in the Whois database for the domain. The domain name will be used for several things, including generation of a free SSL certificate via the AWS Certificate Manager. When you create the stack, you will receive an email asking you to approve the certificate (which you must do before the stack will finish creating). The DNS for this domain doesn't need to exist yet (you'll update this later).
    • For the KeyName, select the key you created or uploaded in the prior step.
    • For the SecretKey, generate a random SECRET_KEY which will be added to the environment (for use by Django, if needed). If your application doesn't need a SECRET_KEY, enter a dummy value here. This can be changed later, if needed.
    • Once you're happy with the values, click Next.
  • On the Options page, click Next (no additional tags, permissions, or notifications are necessary, so these can all be left blank).

  • On the Review page, double check that everything is correct, check the "I acknowledge that AWS CloudFormation might create IAM resources." box, and click Create.

The stack will take about 30 minutes to create, and you can monitor its progress by selecting the stack on the CloudFormation Stacks page and monitoring the Resources and/or Events tabs.

Using the demo

When it is finished, you'll have an Elastic Beanstalk Multicontainer Docker environment running inside a dedicated VPC, along with an S3 bucket for static assets (including an associated CloudFront distribution), a private S3 bucket for uploaded media, a Postgres database, and a Redis instance for caching, session storage, and/or use as a task broker. The environment variables provided to your container are as follows:

  • AWS_STORAGE_BUCKET_NAME: The name of the S3 bucket in which your application should store static assets.
  • AWS_PRIVATE_STORAGE_BUCKET_NAME: The name of the S3 bucket in which your application should store private/uploaded files or media (make sure you configure your storage backend to require authentication to read objects and encrypt them at rest, if needed).
  • CDN_DOMAIN_NAME: The domain name of the CloudFront distribution connected to the above S3 bucket; you should use this (or the S3 bucket URL directly) to refer to static assets in your HTML.
  • DOMAIN_NAME: The domain name you specified when creating the stack, which will be associated with the automatically-generated SSL certificate.
  • SECRET_KEY: The secret key you specified when creating this stack
  • DATABASE_URL: The URL to the RDS instance created as part of this stack.
  • REDIS_URL: The URL to the Redis instance created as part of this stack (may be used as a cache or session storage, e.g.). Note that Redis supports multiple databases and no database ID is included as part of the URL, so you should append a forward slash and the integer index of the database, e.g., /0.

Optional: Uploading your Docker image to the EC2 Container Registry

One of the AWS resources created by AWS Container Basics is an EC2 Container Registry (ECR) repository. If you're using Docker and don't have a place to store images already (or would prefer to consolidate hosting at AWS to simplify authentication), you can push your Docker image to ECR. You can build and push your Docker image as follows:

DOCKER_TAG=$(git rev-parse HEAD)  # or "latest", if you prefer
$(aws ecr get-login --region <region>)
docker build -t <stack-name> .
docker tag <stack-name>:$DOCKER_TAG <account-id>.dkr.ecr.<region><stack-name>:$DOCKER_TAG
docker push <account-id>.dkr.ecr.<region><stack-name>:$DOCKER_TAG

You will need to replace <stack-name> with the name of the stack you entered above, <account-id> with your AWS Account ID, and <region> with your AWS region. You can also see these commands with the appropriate variables filled in by clicking the "View Push Commands" button on the Amazon ECS Repository detail page in the AWS console (note that AWS defaults to using a DOCKER_TAG of latest instead of using the Git commit SHA).

Updating existing stacks

CloudFormation, and by extension Troposphere, also support the concept of "updating" existing stacks. This means you can take an existing CloudFormation template such as AWS Container Basics, fork and tweak it to your needs, and upload the new template to CloudFormation. CloudFormation will calculate the minimum changes necessary to implement the change, inform you of what those are, and give you the option to proceed or decline. Some changes can be done as modifications whereas other, more significant changes (such as enabling encryption on an RDS instance or changing the solution stack for an Elastic Beanstalk environment) require destroying and recreating the underlying resource. CloudFormation will inform you if it needs to do this, so inspect the proposed change list carefully.

Coming Soon: Deployment

In the next post, I'll go over several options for deploying to your newly created stack. In the meantime, the AWS Container Basics README describes one simple option.

June 28, 2017 01:30 PM

Catalin George Festila

The pyquery python module.

This tutorial is about pyquery python module and python 2.7.13 version.
First I used pip command to install it.

C:\Python27>cd Scripts

C:\Python27\Scripts>pip install pyquery
Collecting pyquery
Downloading pyquery-1.2.17-py2.py3-none-any.whl
Requirement already satisfied: lxml>=2.1 in c:\python27\lib\site-packages (from pyquery)
Requirement already satisfied: cssselect>0.7.9 in c:\python27\lib\site-packages (from pyquery)
Installing collected packages: pyquery
Successfully installed pyquery-1.2.17
I try to install with pip and python 3.4 version but I got errors.
The development team told us about this python module:
pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jquery. pyquery uses lxml for fast xml and html manipulation.
Let's try a simple example with this python module.
The base of this example is find links by html tag.
from pyquery import PyQuery

seeds = [

crawl_frontiers = []

def start_crawler():
crawl_frontiers = crawler_seeds()


def crawler_seeds():
frontiers = []
for index, seed in enumerate(seeds):
frontier = {index: read_links(seed)}

return frontiers

def read_links(seed):
crawler = PyQuery(seed)
return [crawler(tag_a).attr("href") for tag_a in crawler("a")]

The read_links function take links from seeds array.
To do that, I need to read the links and put in into another array crawl_frontiers.
The frontiers array is used just for crawler process.
Also this simple example allow you to understand better the arrays.
You can read more about this python module here .

June 28, 2017 10:41 AM

Kelly Yancey

Witty python-requests Session Object

See the full pull request on GitHub.

June 28, 2017 09:30 AM

Matthew Rocklin

Use Apache Parquet

This work is supported by Continuum Analytics and the Data Driven Discovery Initiative from the Moore Foundation.

This is a tiny blogpost to encourage you to use Parquet instead of CSV for your dataframe computations. I’ll use Dask.dataframe here but Pandas would work just as well. I’ll also use my local laptop here, but Parquet is an excellent format to use on a cluster.

CSV is convenient, but slow

I have the NYC taxi cab dataset on my laptop stored as CSV

mrocklin@carbon:~/data/nyc/csv$ ls
yellow_tripdata_2015-01.csv  yellow_tripdata_2015-07.csv
yellow_tripdata_2015-02.csv  yellow_tripdata_2015-08.csv
yellow_tripdata_2015-03.csv  yellow_tripdata_2015-09.csv
yellow_tripdata_2015-04.csv  yellow_tripdata_2015-10.csv
yellow_tripdata_2015-05.csv  yellow_tripdata_2015-11.csv
yellow_tripdata_2015-06.csv  yellow_tripdata_2015-12.csv

This is a convenient format for humans because we can read it directly.

mrocklin@carbon:~/data/nyc/csv$ head yellow_tripdata_2015-01.csv
2,2015-01-15 19:05:39,2015-01-15
1,2015-01-10 20:33:38,2015-01-10
1,2015-01-10 20:33:38,2015-01-10
1,2015-01-10 20:33:39,2015-01-10
1,2015-01-10 20:33:39,2015-01-10
1,2015-01-10 20:33:39,2015-01-10
1,2015-01-10 20:33:39,2015-01-10
1,2015-01-10 20:33:39,2015-01-10
1,2015-01-10 20:33:39,2015-01-10

We can use tools like Pandas or Dask.dataframe to read in all of this data. Because the data is large-ish, I’ll use Dask.dataframe

mrocklin@carbon:~/data/nyc/csv$ du -hs .
22G .
In [1]: import dask.dataframe as dd

In [2]: %time df = dd.read_csv('yellow_tripdata_2015-*.csv')
CPU times: user 340 ms, sys: 12 ms, total: 352 ms
Wall time: 377 ms

In [3]: df.head()
VendorID tpep_pickup_datetime tpep_dropoff_datetime  passenger_count  \
0         2  2015-01-15 19:05:39   2015-01-15 19:23:42                1
1         1  2015-01-10 20:33:38   2015-01-10 20:53:28                1
2         1  2015-01-10 20:33:38   2015-01-10 20:43:41                1
3         1  2015-01-10 20:33:39   2015-01-10 20:35:31                1
4         1  2015-01-10 20:33:39   2015-01-10 20:52:58                1

   trip_distance  pickup_longitude  pickup_latitude  RateCodeID  \
0           1.59        -73.993896        40.750111           1
1           3.30        -74.001648        40.724243           1
2           1.80        -73.963341        40.802788           1
3           0.50        -74.009087        40.713818           1
4           3.00        -73.971176        40.762428           1

  store_and_fwd_flag  dropoff_longitude  dropoff_latitude  payment_type \
0                  N         -73.974785         40.750618             1
1                  N         -73.994415         40.759109             1
2                  N         -73.951820         40.824413             2
3                  N         -74.004326         40.719986             2
4                  N         -74.004181         40.742653             2

   fare_amount  extra  mta_tax  tip_amount  tolls_amount  \
0         12.0    1.0      0.5        3.25           0.0
1         14.5    0.5      0.5        2.00           0.0
2          9.5    0.5      0.5        0.00           0.0
3          3.5    0.5      0.5        0.00           0.0
4         15.0    0.5      0.5        0.00           0.0

   improvement_surcharge  total_amount
0                    0.3         17.05
1                    0.3         17.80
2                    0.3         10.80
3                    0.3          4.80
4                    0.3         16.30

In [4]: from dask.diagnostics import ProgressBar

In [5]: ProgressBar().register()

In [6]: df.passenger_count.sum().compute()
[########################################] | 100% Completed |
3min 58.8s
Out[6]: 245566747

We were able to ask questions about this data (and learn that 250 million people rode cabs in 2016) even though it is too large to fit into memory. This is because Dask is able to operate lazily from disk. It reads in the data on an as-needed basis and then forgets it when it no longer needs it. This takes a while (4 minutes) but does just work.

However, when we read this data many times from disk we start to become frustrated by this four minute cost. In Pandas we suffered this cost once as we moved data from disk to memory. On larger datasets when we don’t have enough RAM we suffer this cost many times.

Parquet is faster

Lets try this same process with Parquet. I happen to have the same exact data stored in Parquet format on my hard drive.

mrocklin@carbon:~/data/nyc$ du -hs nyc-2016.parquet/
17G nyc-2016.parquet/

It is stored as a bunch of individual files, but we don’t actually care about that. We’ll always refer to the directory as the dataset. These files are stored in binary format. We can’t read them as humans

mrocklin@carbon:~/data/nyc$ head nyc-2016.parquet/part.0.parquet
<a bunch of illegible bytes>

But computers are much more able to both read and navigate this data. Lets do the same experiment from before:

In [1]: import dask.dataframe as dd

In [2]: df = dd.read_parquet('nyc-2016.parquet/')

In [3]: df.head()
  tpep_pickup_datetime  VendorID tpep_dropoff_datetime  passenger_count  \
0  2015-01-01 00:00:00         2   2015-01-01 00:00:00                3
1  2015-01-01 00:00:00         2   2015-01-01 00:00:00                1
2  2015-01-01 00:00:00         1   2015-01-01 00:11:26                5
3  2015-01-01 00:00:01         1   2015-01-01 00:03:49                1
4  2015-01-01 00:00:03         2   2015-01-01 00:21:48                2

    trip_distance  pickup_longitude  pickup_latitude  RateCodeID  \
0           1.56        -74.001320        40.729057           1
1           1.68        -73.991547        40.750069           1
2           4.00        -73.971436        40.760201           1
3           0.80        -73.860847        40.757294           1
4           2.57        -73.969017        40.754269           1

  store_and_fwd_flag  dropoff_longitude  dropoff_latitude  payment_type  \
0                  N         -74.010208         40.719662             1
1                  N           0.000000          0.000000             2
2                  N         -73.921181         40.768269             2
3                  N         -73.868111         40.752285             2
4                  N         -73.994133         40.761600             2

   fare_amount  extra  mta_tax  tip_amount  tolls_amount  \
0          7.5    0.5      0.5         0.0           0.0
1         10.0    0.0      0.5         0.0           0.0
2         13.5    0.5      0.5         0.0           0.0
3          5.0    0.5      0.5         0.0           0.0
4         14.5    0.5      0.5         0.0           0.0

   improvement_surcharge  total_amount
0                    0.3           8.8
1                    0.3          10.8
2                    0.0          14.5
3                    0.0           6.3
4                    0.3          15.8

In [4]: from dask.diagnostics import ProgressBar

In [5]: ProgressBar().register()

In [6]: df.passenger_count.sum().compute()
[########################################] | 100% Completed |
Out[6]: 245566747

Same values, but now our computation happens in three seconds, rather than four minutes. We’re cheating a little bit here (pulling out the passenger count column is especially easy for Parquet) but generally Parquet will be much faster than CSV. This lets us work from disk comfortably without worrying about how much memory we have.


So do yourself a favor and convert your data

In [1]: import dask.dataframe as dd
In [2]: df = dd.read_csv('csv/yellow_tripdata_2015-*.csv')
In [3]: from dask.diagnostics import ProgressBar
In [4]: ProgressBar().register()
In [5]: df.to_parquet('yellow_tripdata.parquet')
[############                            ] | 30% Completed |  1min 54.7s

If you want to be more clever you can specify dtypes and compression when converting. This can definitely help give you significantly greater speedups, but just using the default settings will still be a large improvement.


Parquet enables the following:

  1. Binary representation of data, allowing for speedy conversion of bytes-on-disk to bytes-in-memory
  2. Columnar storage, meaning that you can load in as few columns as you need without loading the entire dataset
  3. Row-chunked storage so that you can pull out data from a particular range without touching the others
  4. Per-chunk statistics so that you can find subsets quickly
  5. Compression

Parquet Versions

There are two nice Python packages with support for the Parquet format:

  1. pyarrow: Python bindings for the Apache Arrow and Apache Parquet C++ libraries
  2. fastparquet: a direct NumPy + Numba implementation of the Parquet format

Both are good. Both can do most things. Each has separate strengths. The code above used fastparquet by default but you can change this in Dask with the engine='arrow' keyword if desired.

June 28, 2017 12:00 AM

June 27, 2017

Bruno Rocha

The quality of the python ecosystem


Recently I talked about The Quality of The Python Ecosystem in "Caipyra" a very nice conference in Ribeirão Preto, Brazil.

Here are the slides (in English and also in Portuguese) and some pictures of the awesome event!


Recentemente falei sobre a Qualidade do Ecossitema Python no evento "Caipyra"  em Ribeirão Preto, Brasil.

Aqui estão os slides (em inglês e Português) e também algumas fotos desse evento maravilhoso!



The quality of the python ecosystem - and how we can protect it! from Bruno Rocha


A Qualidade do Ecossistema Python - e o que podemos fazer para mante-la from Bruno Rocha


See full picture collection here:

Extra notes and updates:

Nick Coghlan pointed to those 2 links: has built-in safety and analytics tools

The recommendation engine is open source

June 27, 2017 10:57 PM

Python Engineering at Microsoft

PyData Seattle is next week!

Next week, we’ll be hosting PyData Seattle 2017 at Microsoft. Several hundred attendees, speakers and teachers will converge on our main conference center for three days of talks, tutorials, and other fun.

Microsoft Redmond campus. (Stephen Brashear/Getty Images)

What is PyData?

PyData is a regular conference that occurs all around the world multiple times every year. With a focus on data science, and an abundance of Python, it is one of the most relevant conferences in our field at the moment.

Sponsored by NumFOCUS, PyData conferences showcase the latest developments in libraries, tools, practices and services, presented by the developers and practitioners who create and use them. Many significant projects such as Jupyter, NumPy, pandas, Software Carpentry, and more are fiscally supported by NumFOCUS and have a presence at PyData.

How can I join in?

PyData Seattle has an amazing lineup of sessions this year. Registration is still open, but if you can’t make it to Redmond, Washington next week, we have another exciting option.

For the first time in PyData history, we will be livestreaming the keynotes and a selection of sessions during the conference (thanks to the support of the Channel 9 team)!

PyData Seattle 2017, July 6-7

As usual, every session will be recorded and published on YouTube after the conference, but the livestream will have additional content, interviews, and the opportunity to participate in Q&A using Twitter – in fact, all Q&A for the livestreamed sessions will be using Twitter, so you get to participate just as much as the people in the room.

Thursday and Friday of track 1 on the schedule will be broadcast, including the sessions below. We will also have some more informal chats with the people behind the PyData Seattle conference, and some of the projects represented there.

What should I do right now?

To prepare for the livestream, you can do the following today:

See you next week!

Livestreamed Sessions

See the schedule for updates.

Thursday 6th July

Friday 7th July

June 27, 2017 08:33 PM


Why Is NumPy Only Now Getting Funded?

Recently we announced that NumPy, a foundational package for scientific computing with Python, had received its first-ever grant funding since the start of the project in 2006. Community interest in this announcement was massive, raising our website traffic by over 2600%. The predominant reaction was one of surprise—how could it be that NumPy, of all projects, had never […]

June 27, 2017 07:40 PM

Continuum Analytics News

Continuum Analytics Appoints Aaron Barfoot As Chief Financial Officer

Wednesday, June 28, 2017

AUSTIN, TEXAS—June 28, 2017Continuum Analytics, the creator and driving force behind Anaconda, the leading Open Data Science platform powered by Python, today announced Aaron Barfoot as the company’s new chief financial officer (CFO). Barfoot, a cloud hosting industry veteran and former executive at ClearDATA and Rackspace, will oversee both finance and accounting operations as the company continues to experience rapid growth and reinforces its footprint as the leader in Open Data Science.

“Our company is going through a substantial growth period. 2016 marked more than 11 million downloads, an increase of more than eight million from the previous year, and we’re upwards of 25 million today with more than 4 million active users and we're seeing the similar growth in product revenue,” said Scott Collison, CEO. “It’s the perfect time to welcome Aaron to the team; his background in finance and capital structure for large companies will prove instrumental to Continuum Analytics’ continued success.”  
Anaconda adoption has increased by 37 percent from 2016 to 2017 according to a recent KDNuggets Data Science Poll
“Continuum Analytics’ growth numbers point to an undeniable need for the critical insights data science  delivers. The data it uncovers significantly impacts critical business outcomes and has the power to change the course of a company,” said Barfoot. “As the data science market begins to push $140 billion, I’m thrilled to join the Anaconda team and contribute to the success of the company as it poises for continued growth.”  
Mr. Barfoot previously held the position of vice president of finance for Rackspace (RAX) during the company’s six-year period of significant growth and success in the marketplace. His strategic planning and initiatives helped grow revenue from $362 million in 2007 to $1.5 billion in 2013 while also driving better margins and improving capital returns. He played a key role in acquisitions, provided financial support during the company’s $188 million IPO, and developed several financial planning systems to maximize resources and staffing.
Aaron Barfoot holds a Bachelor of Science in Economics from Baylor University.
About Anaconda Powered by Continuum Analytics
Anaconda is the leading Open Data Science platform powered by Python, the fastest growing data science language with more than 13 million downloads to date. Continuum Analytics is the creator and driving force behind Anaconda, empowering leading businesses across industries worldwide with solutions to identify patterns in data, uncover key insights and transform data into a goldmine of intelligence to solve the world’s most challenging problems. Anaconda puts superpowers into the hands of people who are changing the world. Learn more at
Media Contact:
Jill Rosenthal

June 27, 2017 07:02 PM

Python Anywhere

Using the PythonAnywhere API: an (open source) helper script to create a Django webapp with a virtualenv

With the beta launch of our API, we want to start making it possible for people to do more scripted things with PythonAnywhere.

Our starter for 10 was this: our web-based UI has some helpers for creating new web apps for common web frameworks (Django, Flask, web2py, etc), but they pin you to the system-installed version of those packages. Using a virtualenv would give the user more flexibility, but currently that means using the more complicated "manual config" option.

The API means it's now possible to build a single command-line tool that you can run from a PythonAnywhere console to create, from scratch, a new Django project, with a virtualenv, all in one go.

The command-line tool in a Bash console

The script is called and it's available by default in any PythonAnywhere Bash console. Here's its command-line help:

$ -h
Create a new Django webapp with a virtualenv.  Defaults to
your free domain, the latest version of Django and Python 3.6

Usage: [--domain=<domain> --django=<django-version> --python=<python-version>] [--nuke]
  --domain=<domain>         Domain name, eg   [default:]
  --django=<django-version> Django version, eg "1.8.4"  [default: latest]
  --python=<python-version> Python version, eg "2.7"    [default: 3.6]
  --nuke                    *Irrevocably* delete any existing web app config on this domain. Irrevocably.

Seeing it in action

If we launch the script and accept the defaults, we can watch it automatically going through all the steps you'd normally have to go through manually to create a new web app on PythonAnywhere:

screenshot of helper script building virtualenv etc

screenshot of helper script building virtualenv etc Lovely progress reports! [1]

Once it's run, you can see your web app is immediately live.

screenshot of django it worked page

And your code is ready to go in a folder predictably named after the site's domain name:

17:08 ~ $ tree -L 3                                                   
├── mysite
│   ├──
│   ├── __pycache__
│   │   ├── __init__.cpython-36.pyc
│   │   ├── settings.cpython-36.pyc
│   │   └── urls.cpython-36.pyc
│   ├──
│   ├──
│   └──
└── static
    └── admin
        ├── css
        ├── fonts
        ├── img
        └── js

Plans for the future -- it's open source!

The helper script is online here: Issues, pull requests and suggestions are gratefully accepted.

The audacious vision of the future would be a script called something like which takes the URL to a GitHub repo, and:

It's ambitious but it shouldn't be too hard to get working. We'd love to get you, the user, involved in shaping the way such a tool might work though.

(and of course, there's nothing preventing you from just writing your own script to do these things, and to hell with our own corporate versions! Feel free to let us know if you do that. Or keep it as your own little secret).

[1] That's right, forget cowsay, on PythonAnywhere we have snakesay

June 27, 2017 05:11 PM


Self-Taught Programmer: Interview with Cory Althoff

The topic of becoming a professional programmer from scratch is extremely hot nowadays, and its upward trend isn’t going to change any time soon. The software development industry is growing faster than many others, finds new applications in other industries and naturally craves new minds at an extremely high pace. The problem is, there are a lot of development opportunities, but not enough people to do the job. As a result, a lot of outreach programs and courses are being run by different development communities, commercial companies of different sizes and independent instructors to serve the growing demand for new professionals. Many people are coming from the non-CS world. One of the most vivid examples in the Python world is the scientific/data analysis segment growing incredibly fast, which has caught up with Python web development within just a couple of years. Not surprisingly, a big portion of programs and courses are intended for people with little to no previous programming knowledge.


Today we publish an interview with Cory Althoff, a true self-taught programmer who graduated with a political science degree, but later decided to jump into software development. He met difficulties on the way to becoming a professional developer, but after a year of self-study, he learned to program well enough to land a job as a software engineer at eBay. He now lives his new happy life. Just a half-year ago he published The Self-Taught Programmer: The Definitive Guide to Programming Professionally book to help other people willing to enter the industry. The book got some traction and many positive reviews on different book platforms including Goodreads and Amazon. Read on to learn more about the man, his experience, and vision.

– Hi Cory, could you tell us a bit about yourself and how you became a self-taught programmer?

Hey there. My name is Cory Althoff. I am 28 years old, and I graduated from Clemson University with a major in political science. After graduation, I was living in Silicon Valley, and having trouble finding a job, so I decided to learn to program. After a year of self-study, I landed a job as a software engineer at eBay, followed by a position as a software engineer at a startup in Palo Alto. Eventually, my girlfriend and I left our jobs to backpack around Southeast Asia. We spent two months traveling in Thailand, Malaysia, Australia and Bali, where I came up with the idea to write The Self-Taught Programmer: The Definitive Guide to Programming Professionally.

– What was the idea and the motivation behind writing your own book?

My journey learning to program, and my experience at my first job as a software engineer were the inspiration for my book. The primary challenge I faced learning to program was that I didn’t know what I needed to learn. Most of the beginner programming books focus on helping you learn to write basic code in a language like Python or Ruby. But if you want to program professionally, there is so much more you need to learn: such as OOP or functional programming, version control, regular expressions, IDEs, best practices, computer science, developer tools, etc. To quote my book, “Many of the subjects covered in a single chapter of this book could be—and are—covered by entire books. My goal is not to cover every detail of every subject you need to know. My goal is to give you a map—an outline of all of the skills you need to develop to program professionally.” I wrote this book to give anyone interesting in learning to program professionally an overview of what they need to learn.

– Was Python your first programming language? How long have you been programming in it?

Yes, Python was the first language I learned. I’ve been programming in Python for around five years now. I also program in JavaScript, and I’ve dabbled in a few other languages, but Python is still my favorite language.

How long did it take you to learn Python from scratch? How long do you think it takes a person on average to become an intermediate-level Python programmer?

It took me a year to become proficient in Python. That is the amount of time I would estimate someone could become an intermediate Python programmer (starting from scratch) if they are willing to put in the time and program seven days a week. However, I’ve been programming in Python for five years, and I am always learning more about the language, so it is a never-ending journey.

One of the sections of your book is dedicated to PyCharm. At which stage of your development journey did you choose PyCharm and why?

When I first started programming with Python, I used IDLE. Eventually, I switched to PyScripter. When I got to eBay, everyone on my team used PyCharm. I decided to give it a try, and I’ve never gone back!

– What are your favorite PyCharm features that you can’t imagine working without?

My favorite PyCharm features are local history, the debugger, the command line, version control, and database interfaces. PyCharm’s local history feature saves me a ton of time. Anytime I make a change, PyCharm automatically saves it, which allows me to experiment much quicker because I know I can always quickly and easily revert to my previous code. Next up is the debugger, which I couldn’t imagine programming without. My favorite features of the debugger are the ability to set a conditional breakpoint and execute code once you are there. Finally, being able to use the command line, version control and connect to a database without leaving my IDE saves me time and makes me a more productive programmer.

– How much time did it take you to learn PyCharm from scratch? Do you think learning tools is a big part of a developer’s trade?

I was able to start using PyCharm immediately, but it took me a day or two to learn all its major features. Once I finished reading the Quick Start Guide, I was all set. Learning tools is a big part of a developer’s trade. When I was learning to program, I had no idea that was the case. That is why I chose to make learning programming tools one of the five sections of my book. Beginners need to learn to program, but they also need to learn to use tools like version control and their IDE, which they often overlook.

– What do you think about Python as a first language for learning programming?

The best first language to learn is a debate that always comes up in the Self-Taught Programmers Facebook group. The argument usually ends up being whether it is better to learn a high-level language like Python first, or a low-level language like C. I think Python is the best language for a new programmer because it increases your chances of successfully learning to program. When someone is learning to program, they need to get a “win” as fast as possible, like writing a program that does something interesting. If they can get there, it significantly increases their chances of continuing to learn. The problem with starting with a low-level language like C is that it dramatically increases the time it will take to get that first win, and therefore enhances the likelihood that person will give up before learning to program. Some people advocate for languages like JavaScript and Ruby instead of Python. But for me, Python is the best choice for beginners.

– What do you think makes Python so unique as a language?

Readability. I’ve never used a language easier to read than Python, which of course was one of Guido’s central insights when developing it. Python’s readability is one of the reasons I teach it in my book.

– Is there anything new you’re currently working on at the moment?

Currently, I am writing a new book called The Self-Taught Web Developer, managing the Self-Taught Programmers Facebook group, which just surpassed 16,000 members, and working on a new project called Take a Class with Me. Every month, I pick a technical course that participants take as a group. Then, each week, I organize a Slack chat where everyone can get together and discuss the material from it and help anyone that is stuck. Right now, the courses are for beginners, but next month I am adding courses for advanced programmers covering topics like Angular and machine learning. Make sure to sign up for the Take a Class with Me newsletter if you are interested in participating.

– Please recommend two other books on development you enjoyed reading recently or just consider important.

My two favorite technical books are The Pragmatic Programmer by Andy Hunt and Dave Thomas, and Problem Solving with Algorithms and Data Structures using Python by Brad Miller and David Ranum. The Pragmatic Programmer blew my mind. I learned so much from it, and it significantly improved me as a programmer. Problem Solving with Algorithms and Data Structures using Python is the best book on data structures and algorithms I’ve ever read. It is so much easier to read than the books that frequently are recommended, like Introduction to Algorithms. I would never have been able to pass my first technical interview without it.

– Thank you for the interview, Cory!

Thanks for having me!


Cory Althoff is a self-taught programmer and writer. He has worked as a software engineer at eBay, as well as several startups in Silicon Valley, despite majoring in Political Science at Clemson University. When taking a break from programming, you can find him reading and traveling—the idea for “The Self-Taught Programmer” originated in the back seat of a taxi in Bali. Currently, he lives in Portland, OR.

June 27, 2017 01:05 PM

S. Lott

OOP and FP -- Objects vs. Functional -- Avoiding reductionist thinking

Real Quote (lightly edited to remove tangential nonsense.)

Recently, I watched a video and it stated that OO is about nouns and Functional programming is about the verbs. Also, ... Aspect-Oriented Programming with the e Verification Language  by David Robinson 
It would be nice to have a blog post which summarized the various mindset associated w/ the various paradigms.

I find the word "mindset" to be challenging.

Yes. All Turing Complete programming languages do have a kind of fundamental equivalence at the level of computing stuff represented as numbers. This, however, seems reductionist.

["All languages were one language to him. All languages were 'woddly'." Paraphrased from James Thurber's "The Great Quillow", a must-read.]

So. Beyond the core Turing Completeness features of a language, the rest is reduced to a difference in "mindset"? The only difference is how we pronounce "Woddly?"

"Mindset" feels reductionist. It replaces a useful summary of language features with a dismissive "mindset" categorization of languages. In a way, this seems to result from bracketing technology choices as "religious wars," where the passion for a particular choice outweighs the actual relevance; i.e., "All languages have problems, so use Java."

In my day job, I work with three kinds of Python problems:
  • Data Science
  • API Services
  • DevOps/TechOps Automation
In many cases, one person can have all three problems. These aren't groups of people. These are problem domains.

I think the first step is to replace "mindset" with "problem domain". It's a start, but I'm not sure it's that simple.

When someone has a data science problem, they often solve it with purely function features of Python. Generally they do this via numpy, but I've been providing examples of generator expressions and comprehensions in my Code Dojo webinars. Generator expressions are an elegant, functional approach to working with stateless data objects.

In Python 3, the following kind of code doesn't create gigantic intermediate data structures. The functional transformations are applied to each item generated by the "source".

x = map(transform, source)
y = filter(selector_rule, x)
z = Counter(y)

I prefer to suggest that a fair amount of data analysis involves little or no mutation of state. Functional features of a language seem to work well with immutable data.

There is state change, but it's at a macro level. For example, the persistence of capturing data is a large-scale state change that's often implemented at the OS level, not the language level.

When someone's building an API, on the other hand, they're often working with objects that have a mutating state. Some elements of an API will involve state change, and objects model state change elegantly. RESTful API's can deserialize objects from storage, make changes, and serialize the modified object again.

[This summary of RESTful services is also reductionist, and therefore, possibly unhelpful.]

When there's mutability, then objects might be more appropriate than functions.

I'm reluctant to call this "mindset." It may not be "problem domain." It seems to be a model that involves mutable or immutable state.

When someone's automating their processing, they're wrestling with OS features, and OS management of state change. They might be installing stuff, or building Docker images, or gluing together items in a CI/CD pipeline, setting the SSL keys, or figuring out how to capture Behave output as part of Gherkin acceptance testing. Lots of interesting stuff that isn't the essential problem at hand, but is part of building a useful, automated solution to the problem.

The state in these problems is maintained by the OS. Application code may -- or may not -- try to model that state.

When doing Blue/Green deployments, for example, the blueness and greenness isn't part of the server farm, it's part of an internal model of how the servers are being used. This seems to be stateful; object-oriented programming might be helpful. When the information can be gleaned from asset management tools, then perhaps a functional processing stream is more important for gathering, deciding, and taking action.

I'm leaning toward the second view-point, and suggesting that some of the OO DevOps programming might be better looked at as functional map-filter-reduce processing. Something like

action_to_take = some_decision_pipeline(current state, new_deployment)

This reflects the questions of state change. It may not be the right abstraction though, because carrying out the action is, itself, a difficult problem that involves determining the state of the server farm, and then applying some change to one or more servers.

We often think of server state change as imperative in nature. It feels like object-oriented programming. There are steps, the object models those steps. I'm not sure that's right. I think there's a repeated "determine next action" cycle here. Sometimes it involves waiting for an action to finish. Yes, it sounds like polling the server farm. I'm not sure that's wrong. How else do you know a server is crashed except by polling it?

I think we've moved a long way from "mindset."

I think it's about fitting language features to a problem in a way that creates the right abstraction to capture (and eventually) solve the problem.

I haven't mentioned Aspect-Oriented Programming because it seems to cut across the functional/object state management boundary. It's a distinctive approach to organizing reusable functionality. I don't mean to dismiss it as uninteresting. I mean to set it aside as orthogonal to the "mutable state" consideration that seems to to be one of the central differences between OOP and FP.

In response to the request: "No. I won't map mindset to paradigm."

June 27, 2017 08:00 AM

Talk Python to Me

#118 Serverless software

Let's consider the progression we've been on over the past 15 or so years. <br/> <br/> We've gone from software and operating systems that we manage running on hardware that we own (and babysit), to virtual machines on our hardware, to IaaS in the cloud and PaaS in the cloud. Then onward to containers, usually docker, running on someone else's systems in the cloud, and maybe even microservices which are conglomerates of these containers working together managed by Kubernetes. <br/> <br/> Where do we go from there? I can't tell you the final destination, but I believe we've reached a leaf node in this hierarchy with our topic today. <br/> <br/> On this, episode 118 of Talk Python To Me, with Ryan Scott Brown, we are going to explore serverless computing. It's an interesting paradigm shift and I hope you enjoy this conversation. <br/> <br/> It was recorded May 24th, 2017.<br/> <br/> Links from the show:<br/> <br/> <div style="font-size: .85em;"><b>Ryan on Twitter</b>: <a href="" target="_blank">@ryan_sb</a><br/> <b>Ryan's site</b>: <a href="" target="_blank"></a><br/> <b>Hello Retail</b>: <a href="" target="_blank"></a><br/> <b>iRobot on Lambda</b>: <a href="" target="_blank"></a><br/> <b>ZAPPA: Serverless Python Web Services</b>: <a href="" target="_blank"></a><br/> <b>AWS Lambda</b>: <a href="" target="_blank"></a><br/> <b>Building scikitlearn for AWS Lambda</b>: <a href="" target="_blank"></a><br/> <b>Gone in 60 milliseconds</b>: <a href="" target="_blank"></a><br/> <b>Ryan's course: The Serverless Framework with GraphQL</b>: <a href="" target="_blank"></a><br/> <b>Ryan's course: AWS Lambda</b>: <a href="" target="_blank"></a><br/> <br/> <b>Sponsored Links</b><br/> <b>Talk Python Courses</b>: <a href="" target="_blank"></a><br/></div>

June 27, 2017 08:00 AM

Daniel Bader

Linked Lists in Python

Linked Lists in Python

Learn how to implement a linked list data structure in Python, using only built-in data types and functionality from the standard library.

Python Linked Lists

Every Python programmer should know about linked lists:

They are among the simplest and most common data structures used in programming.

So, if you ever found yourself wondering, “Does Python have a built-in or ‘native’ linked list data structure?” or, “How do I write a linked list in Python?” then this tutorial will help you.

Python doesn’t ship with a built-in linked list data type in the “classical” sense. Python’s list type is implemented as a dynamic array—which means it doesn’t suit the typical scenarios where you’d want to use a “proper” linked list data structure for performance reasons.

Please note that this tutorial only considers linked list implementations that work on a “plain vanilla” Python install. I’m leaving out third-party packages intentionally. They don’t apply during coding interviews and it’s difficult to keep an up-to-date list that considers all packages available on Python packaging repositories.

Before we get into the weeds and look at linked list implementations in Python, let’s do a quick recap of what a linked list data structure is—and how it compares to an array.

What are the characteristics of a linked list?

A linked list is an ordered collection of values. Linked lists are similar to arrays in the sense that they contain objects in a linear order. However they differ from arrays in their memory layout.

Arrays are contiguous data structures and they’re composed of fixed-size data records stored in adjoining blocks of memory. In an array, data is tightly packed—and we know the size of each data record which allows us to quickly look up an element given its index in the array:

Array Visualization

Linked lists, however, are made up of data records linked together by pointers. This means that the data records that hold the actual “data payload” can be stored anywhere in memory—what creates the linear ordering is how each data record “points” to the next one:

Linked List Visualization

There are two different kinds of linked lists: singly-linked lists and doubly-linked lists. What you saw in the previous example was a singly-linked list—each element in it has a reference to (a “pointer”) to the next element in the list.

In a doubly-linked list, each element has a reference to both the next and the previous element. Why is this useful? Having a reference to the previous element can speed up some operations, like removing (“unlinking”) an element from a list or traversing the list in reverse order.

How do linked lists and arrays compare performance-wise?

You just saw how linked lists and arrays use different data layouts behind the scenes to store information. This data layout difference reflects in the performance characteristics of linked lists and arrays:

Now, how does this performance difference come into play with Python? Remember that Python’s built-in list type is in fact a dynamic array. This means the performance differences we just discussed apply to it. Likewise, Python’s immutable tuple data type can be considered a static array in this case—with similar performance trade-offs compared to a proper linked list.

Does Python have a built-in or “native” linked list data structure?

Let’s come back to the original question. If you want to use a linked list in Python, is there a built-in data type you can use directly?

The answer is: “It depends.”

As of Python 3.6 (CPython), doesn’t provide a dedicated linked list data type. There’s nothing like Java’s LinkedList built into Python or into the Python standard library.

Python does however include the collections.deque class which provides a double-ended queue and is implemented as a doubly-linked list internally. Under some specific circumstances you might be able to use it as a “makeshift” linked list. If that’s not an option you’ll need to write your own linked list implementation from scratch.

How do I write a linked list using Python?

If you want to stick with functionality built into the core language and into the Python standard library you have two options for implementing a linked list:

Now that we’ve covered some general questions on linked lists and their availability in Python, read on for examples of how to make both of the above approaches work.

Option 1: Using collections.deque as a Linked List

This approach might seem a little odd at first because the collections.deque class implements a double-ended queue, and it’s typically used as the go-to stack or queue implementation in Python.

But using this class as a “makeshift” linked list might make sense under some circumstances. You see, CPython’s deque is powered by a doubly-linked list behind the scenes and it provides a full “list-like” set of functionality.

Under some circumstances, this makes treating deque objects as linked list replacements a valid option. Here are some of the key performance characteristics of this approach:

Using collections.deque as a linked list in Python can be a valid choice if you mostly care about insertion performance at the beginning or the end of the list, and you don’t need access to the previous-element and next-element pointers on each object directly.

Don’t use a deque if you need O(1) performance when removing elements. Removing elements by key or by index requires an O(n) search, even if you have already have a reference to the element to be removed. This is the main downside of using a deque like a linked list.

If you’re looking for a linked list in Python because you want to implement queues or a stacks then a deque is a great choice, however.

Here are some examples on how you can use Python’s deque class as a replacement for a linked list:

>>> import collections
>>> lst = collections.deque()

# Inserting elements at the front
# or back takes O(1) time:
>>> lst.append('B')
>>> lst.append('C')
>>> lst.appendleft('A')
>>> lst
deque(['A', 'B', 'C'])

# However, inserting elements at
# arbitrary indexes takes O(n) time:
>>> lst.insert(2, 'X')
>>> lst
deque(['A', 'B', 'X', 'C'])

# Removing elements at the front
# or back takes O(1) time:
>>> lst.pop()
>>> lst.popleft()
>>> lst
deque(['B', 'X'])

# Removing elements at arbitrary
# indexes or by key takes O(n) time again:
>>> del lst[1]
>>> lst.remove('B')

# Deques can be reversed in-place:
>>> lst = deque(['A', 'B', 'X', 'C'])
>>> lst.reverse()
deque(['C', 'X', 'B', 'A'])

# Searching for elements takes
# O(n) time:
>>> lst.index('X')

Option 2: Writing Your Own Python Linked Lists

If you need full control over the layout of each linked list node then there’s no perfect solution available in the Python standard library. If you want to stick with the standard library and built-in data types then writing your own linked list is your best bet.

You’ll have to make a choice between implementing a singly-linked or a doubly-linked list. I’ll give examples of both, including some of the common operations like how to search for elements, or how to reverse a linked list.

Let’s take a look at two concrete Python linked list examples. One for a singly-linked list, and one for a double-linked list.

✅ A Singly-Linked List Class in Python

Here’s how you might implement a class-based singly-linked list in Python, including some of the standard algorithms:

class ListNode:
    A node in a singly-linked list.
    def __init__(self, data=None, next=None): = data = next

    def __repr__(self):
        return repr(

class SinglyLinkedList:
    def __init__(self):
        Create a new singly-linked list.
        Takes O(1) time.
        self.head = None

    def __repr__(self):
        Return a string representation of the list.
        Takes O(n) time.
        nodes = []
        curr = self.head
        while curr:
            curr =
        return '[' + ', '.join(nodes) + ']'

    def prepend(self, data):
        Insert a new element at the beginning of the list.
        Takes O(1) time.
        self.head = ListNode(data=data, next=self.head)

    def append(self, data):
        Insert a new element at the end of the list.
        Takes O(n) time.
        if not self.head:
            self.head = ListNode(data=data)
        curr = self.head
            curr = = ListNode(data=data)

    def find(self, key):
        Search for the first element with `data` matching
        `key`. Return the element or `None` if not found.
        Takes O(n) time.
        curr = self.head
        while curr and != key:
            curr =
        return curr  # Will be None if not found

    def remove(self, key):
        Remove the first occurence of `key` in the list.
        Takes O(n) time.
        # Find the element and keep a
        # reference to the element preceding it
        curr = self.head
        prev = None
        while curr and != key:
            prev = curr
            curr =
        if curr:
            # Unlink it from the list
   = None

    def reverse(self):
        Reverse the list in-place.
        Takes O(n) time.
        curr = self.head
        prev_node = None
        next_node = None
        while curr:
            next_node =
   = prev_node
            prev_node = curr
            curr = next_node
        self.head = prev_node

And here’s how you’d use this linked list class in practice:

>>> lst = SinglyLinkedList()
>>> lst

>>> lst.prepend(23)
>>> lst.prepend('a')
>>> lst.prepend(42)
>>> lst.prepend('X')
>>> lst.append('the')
>>> lst.append('end')

>>> lst
['X', 42, 'a', 23, 'the', 'end']

>>> lst.find('X')
>>> lst.find('y')

>>> lst.reverse()
>>> lst
['end', 'the', 23, 'a', 42, 'X']

>>> lst.remove(42)
>>> lst
['end', 'the', 23, 'a', 'X']

>>> lst.remove('not found')

Note that removing an element in this implementation is still an O(n) time operation, even if you already have a reference to a ListNode object.

In a single linked list removing an element typically requires searching the list because we need to know the previous and the next element. With a double-linked list you could write a remove_elem() method that unlinks and removes a node from the list in O(1) time.

✅ A Doubly-Linked List Class in Python

Let’s have a look at how to implement a doubly-linked list in Python. The following DoublyLinkedList class should point you in the right direction:

class DListNode:
    A node in a doubly-linked list.
    def __init__(self, data=None, prev=None, next=None): = data
        self.prev = prev = next

    def __repr__(self):
        return repr(

class DoublyLinkedList:
    def __init__(self):
        Create a new singly linked list.
        Takes O(1) time.
        self.head = None

    def __repr__(self):
        Return a string representation of the list.
        Takes O(n) time.
        nodes = []
        curr = self.head
        while curr:
            curr =
        return '[' + ', '.join(nodes) + ']'

    def prepend(self, data):
        Insert a new element at the beginning of the list.
        Takes O(1) time.
        new_head = DListNode(data=data, next=self.head)
        if self.head:
            self.head.prev = new_head
        self.head = new_head

    def append(self, data):
        Insert a new element at the end of the list.
        Takes O(n) time.
        if not self.head:
            self.head = DListNode(data=data)
        curr = self.head
            curr = = DListNode(data=data, prev=curr)

    def find(self, key):
        Search for the first element with `data` matching
        `key`. Return the element or `None` if not found.
        Takes O(n) time.
        curr = self.head
        while curr and != key:
            curr =
        return curr  # Will be None if not found

    def remove_elem(self, node):
        Unlink an element from the list.
        Takes O(1) time.
        if node.prev:
   = node.prev
        if node == self.head:
            self.head =
        node.prev = None = None

    def remove(self, key):
        Remove the first occurence of `key` in the list.
        Takes O(n) time.
        elem = self.find(key)
        if not elem:

    def reverse(self):
        Reverse the list in-place.
        Takes O(n) time.
        curr = self.head
        prev_node = None
        while curr:
            prev_node = curr.prev
            curr.prev =
   = prev_node
            curr = curr.prev
        self.head = prev_node.prev

Here are a few examples on how to use this class. Notice how we can now remove elements in O(1) time with the remove_elem() function if we already hold a reference to the list node representing the element:

>>> lst = DoublyLinkedList()
>>> lst

>>> lst.prepend(23)
>>> lst.prepend('a')
>>> lst.prepend(42)
>>> lst.prepend('X')
>>> lst.append('the')
>>> lst.append('end')

>>> lst
['X', 42, 'a', 23, 'the', 'end']

>>> lst.find('X')
>>> lst.find('y')

>>> lst.reverse()
>>> lst
['end', 'the', 23, 'a', 42, 'X']

>>> elem = lst.find(42)
>>> lst.remove_elem(elem)

>>> lst.remove('X')
>>> lst.remove('not found')
>>> lst
['end', 'the', 23, 'a']

Both example for Python linked lists you saw here were class-based. An alternative approach would be to implement a Lisp-style linked list in Python using tuples as the core building blocks (“cons pairs”). Here’s a tutorial that goes into more detail: Functional Linked Lists in Python.

Python Linked Lists: Recap & Recommendations

We just looked at a number of approaches to implement a singly- and doubly-linked list in Python. You also saw some code examples of the standard operations and algorithms, for example how to reverse a linked list in-place.

You should only consider using a linked list in Python when you’ve determined that you absolutely need a linked data structure for performance reasons (or you’ve been asked to use one in a coding interview.)

In many cases the same algorithm implemented on top of Python’s highly optimized list objects will be sufficiently fast. If you know a dynamic array won’t cut it and you need a linked list, then check first if you can take advantage of Python’s built-in deque class.

If none of these options work for you, and you want to stay within the standard library, only then should you write your own Python linked list.

In an interview situation I’d also advise you to write your own implementation from scratch because that’s usually what the interviewer wants to see. However it can be beneficial to mention that collections.deque offers similar performance under the right circumstances. Good luck and…Happy Pythoning!

Read the full “Fundamental Data Structures in Python” article series here. This article is missing something or you found an error? Help a brother out and leave a comment below.

June 27, 2017 12:00 AM

June 26, 2017

Damián Avila

Trading logbook update 3

OK, I have run my models again and it was time to enter the market.

Early today, I opened two positions:

Read more… (1 min remaining to read)

June 26, 2017 09:29 PM

Weekly Python Chat


How should you count the number of times each thing occurs in a list? There are a lot of ways to solve this problem, but the most Python is often to use Counter.

Let's talk about different ways of counting things in a list, including Counter!

June 26, 2017 06:00 PM

Shopkick Tech Blog

Seashore: A Collection of Shell Abstractions

For many months now, Shopkick has been undergoing an infrastructure revolution. Modern infrastructure means automation, and we find ourselves automating UNIX-like systems a lot. We like UNIX, because you can do amazingly powerful things in just a couple lines:

eval $(docker-machine env default)
docker run --rm /opt/python/cp27mu/bin/python --version

While we like UNIX, we love Python. Unfortunately, the equivalent Python code gets quite a bit heavier using only the standard library. First we would have to manually parse the output from docker-machine env, and then write a long subprocess.check_call([...]).

Is it possible to have our UNIX cake and elegantly eat it in Python? We want the formality of Python, with its ability to handle, say, file names with spaces in them and the ease of writing command-lines in shell. And can we do even better, and generate code that is amenable to unit testing, something which the UNIX shell provides no straightforward facility?

We certainly think so!

Just in time for summer, Seashore is a collection of elegant shell abstractions with an eye toward brevity and testability.

Using Seashore, the code above can be rewritten as:

from seashore import Executor, Shell, NO_VALUE
xctr = Executor(Shell())
dm_xctr = xctr.in_docker_machine('default')
version =
    '', '/opt/python/cp27-cp27mu/bin/python',
    '--version', remove=NO_VALUE).batch()[0].strip()
print("version is {}".format(version))

Since all of the low-level work of running a command line is done in Shell(), it is reasonably easy to convert this into unit-testable code:

from seashore import Executor, Shell, NO_VALUE

def get_python_27_version(shell=None):
    xctr = Executor(shell or Shell())
    dm_xctr = xctr.in_docker_machine('default')
    version =
        '', '/opt/python/cp27-cp27mu/bin/python',
        '--version', remove=NO_VALUE).batch()[0].strip()
    return version

Now, we can unit test with something along the lines of

import unittest, mock
import python_version

class TestPythonVersion(unittest.TestCase):
    def test_basic(self):
        shell = mock.Mock()
        def side_effect(cmd, **kwargs):
            if cmd[0] == 'docker-machine':
            elif cmd[0] == 'docker':
                return '2.7.1242', ''
            shell.batch.side_effect = side_effect
            version = python_version.get_python_27_version(shell)
            self.assertEquals(version, '2.7.1242')
            # Assert that shell.batch was called with the right arguments

Seashore has native support for common Python and Python-adjacent tools, including git, pip, virtualenv, conda, docker, and more! Plus, it can be easily extended to support all your automation needs.

Seashore is available on GitHub and PyPI, where it's released under the MIT license.

Documentation is available on Read the Docs. Development is all public, and very active right now. With seashore already being used in production, all issues and pull requests are more than welcome!

Seashore is just the first of many Shopkick open-source releases we hope to bring you here on Be sure to subscribe for more!

June 26, 2017 03:45 PM

Caktus Consulting Group

5 Ways to Deploy Your Python Web App in 2017 (PyCon 2017 Must-See Talk 4/6)

Part four of six in the 2017 edition of our annual PyCon Must-See Series, highlighting the talks our staff especially loved at PyCon. While there were many great talks, this is our team's shortlist.

I went into Andrew T Baker’s talk on deploying Python applications with some excitement about learning some new deployment methods. I had no idea that Andrew was going to deploy a simple “Hello World” app live, in 5 different ways!

  1. First up, Andrew used ngrok to expose localhost running on his machine to the web. I’ve used ngrok before to share with QA, but never thought about using it to share with a client. Interesting!

  2. Heroku was up next with a gunicorn Python web app server, with a warning that scaling is costly after the one free app per account.

  3. The third deploy was “Serverless” with an example via AWS Lambda, although many other serverless options exist.

  4. The next deploy was described as the way most web shops deploy, via virtual machines. The example deploy was done over the Google Cloud Platform, but another popular method for this is via Amazon EC2. This method is fairly manual, Andrew explained, with a need to Secure Shell (SSH) into your server after you spin it up.

  5. The final deploy was done via Docker with a warning that it is still fairly new and there isn't as much documentation available.

I am planning to rewatch Andrew’s talk and follow along on my machine. I’m excited to see what I can do.

June 26, 2017 01:30 PM

Doug Hellmann

decimal — Fixed and Floating Point Math — PyMOTW 3

The decimal module implements fixed and floating point arithmetic using the model familiar to most people, rather than the IEEE floating point version implemented by most computer hardware and familiar to programmers. A Decimal instance can represent any number exactly, round up or down, and apply a limit to the number of significant digits. Read … Continue reading decimal — Fixed and Floating Point Math — PyMOTW 3

June 26, 2017 01:00 PM

"Fredrik Håård's Blaag"

Blaag Migrated

So I've migrated the Blaag to Hugo, and merged it with my homepage in an attempt to get back into blogging a bit and maybe even keep the homepage updated every now and then.

I wrote a script to do some heavy lifting migrating the sometimes not quite correct RST files to org-mode as well, since I already do almost everything else in org-mode, and never use RST at all anymore.

Hopefully the various redirects etc works and there'll be no broken links as a result of this!

June 26, 2017 12:53 PM