skip to navigation
skip to content

Planet Python

Last update: August 19, 2017 04:48 AM

August 18, 2017


Simple is Better Than Complex

How to Render Django Form Manually

Dealing with user input is a very common task in any Web application or Web site. The standard way to do it is through HTML forms, where the user input some data, submit it to the server, and then the server does something with it. Now, the chances are that you might have already heard that quote: “All input is evil!” I don’t know who said that first, but it was very well said. Truth is, every input in your application is a door, a potential attack vector. So you better secure all doors! To make your life easier, and to give you some peace of mind, Django offers a very rich, reliable and secure forms API. And you should definitely use it, no matter how simple your HTML form is.

Managing user input, form processing is a fairly complex task, because it involves interacting with many layers of your application. It have to access the database; clean, validate, transform, and guarantee the integrity of the data; sometimes it needs to interact with multiple models, communicate human readable error messages, and then finally it also have to translate all the Python code that represents your models into HTML inputs. In some cases, those HTML inputs may involve JavaScript and CSS code (a custom date picker, or an auto-complete field for example).

The thing is, Django does very well the server-side part. But it doesn’t mess much with the client-side part. The HTML forms automatically generated by Django is fully functional and can be used as it is. But it’s very crude, it’s just plain HTML, no CSS and no JavaScripts. It was done that way so you can have total control on how to present the forms so to match with your application’s Web design. On the server-side is a little bit different, as thing are more standardized, so most of the functionalities offered by the forms API works out-of-the-box. And for the special cases, it provide many ways to customize it.

In this tutorial I will show you how to work with the rendering part, using custom CSS and making your forms prettier.

Here is the table of contents of this article:


Working Example

Throughout the whole tutorial I will be using the following form definition to illustrate the examples:

forms.py

from django import forms

class ContactForm(forms.Form):
    name = forms.CharField(max_length=30)
    email = forms.EmailField(max_length=254)
    message = forms.CharField(
        max_length=2000,
        widget=forms.Textarea(),
        help_text='Write here your message!'
    )
    source = forms.CharField(       # A hidden input for internal use
        max_length=50,              # tell from which page the user sent the message
        widget=forms.HiddenInput()
    )

    def clean(self):
        cleaned_data = super(ContactForm, self).clean()
        name = cleaned_data.get('name')
        email = cleaned_data.get('email')
        message = cleaned_data.get('message')
        if not name and not email and not message:
            raise forms.ValidationError('You have to write something!')

And the following view just to load the form and trigger the validation process so we can have the form in different states:

views.py

from django.shortcuts import render
from .forms import ContactForm

def home(request):
    if request.method == 'POST':
        form = ContactForm(request.POST)
        if form.is_valid():
            pass  # does nothing, just trigger the validation
    else:
        form = ContactForm()
    return render(request, 'home.html', {'form': form})

Understanding the Rendering Process

In many tutorials or in the official Django documentation, it’s very common to see form templates like this:

<form method="post" novalidate>
  {% csrf_token %}
  {{ form }}
  <button type="submit">Submit</button>
</form>
Note: Maybe you are wondering about the novalidate attribute in the form. In a real case you probably won't want to use it. It prevents the browser from "validating" the data before submitting to the server. As in the examples we are going to explore I only have "required" field errors, it would prevent us from seeing the server-side actual data validation and exploring the error states of the form rendering.

It looks like magic, right? Because this particular form may contain 50 fields, and the simple command {{ form }} will render them all in the template.

When we write {{ form }} in a template, it’s actually accessing the __str__ method from the BaseForm class. The __str__ method is used to provide a string representation of an object. If you have a look in the source code, you will see that it returns the as_table() method. So, basically {{ form }} and {{ form.as_table }} is the same thing.

The forms API offer three methods to automatically render the HTML form:

They all work more or less in the same way, the difference is the HTML code that wraps the inputs.

Below is the result of the previous code snippet:

Contact Form

But, if {{ form }} and {{ form.as_table }} is the same thing, the output definitively doesn’t look like a table, right? That’s because the as_table() and as_ul() doesn’t create the <table> and the <ul> tags, so we have to add it by ourselves.

So, the correct way to do it would be:

<form method="post" novalidate>
  {% csrf_token %}
  <table border="1">
    {{ form }}
  </table>
  <button type="submit">Submit</button>
</form>

Contact Form

Now it makes sense right? Without the <table> tag the browser doesn’t really know how to render the HTML output, so it just present all the visible fields in line, as we don’t have any CSS yet.

If you have a look in the _html_output private method defined in the BaseForm, which is used by all the as_*() methods, you will see that it’s a fairly complex method with 76 lines of code and it does lots of things. It’s okay because this method is well tested and it’s part of the core of the forms API, the underlying mechanics that make things work. When working on your own form rendering logic you won’t need to write Python code to do the job. It’s much better to do it using the Django Templates engine, as you can achieve a more clean and easier to maintain code.

I’m mentioning the _html_output method here because we can use it to analyze what kind of code it’s generating, what it’s really doing, so we can mimic it using the template engine. It’s also a very good exercise to read the source code and get more comfortable with it. It’s a great source of information. Even though Django’s documentation is very detailed and extensive, there are always some hidden bits here and there. You also get the chance to see by examples how smart coders solved specific problems. After all, it’s an open source project with a mature development process that many have contributed, so the chances are you are reading an optimal code.

Anyway, here it is, in a nutshell, what the _html_output does:

Here is what the second state of the form looks like, triggering all the validation errors:

Contact Form With Errors

Now that we know what it’s doing, we can try to mimic the same behavior using the template engine. This way, we will have much more control over the rendering process:

<form method="post" novalidate>
  {% csrf_token %}

  {{ form.non_field_errors }}

  {% for hidden_field in form.hidden_fields %}
    {{ hidden_field.errors }}
    {{ hidden_field }}
  {% endfor %}

  <table border="1">
    {% for field in form.visible_fields %}
      <tr>
        <th>{{ field.label_tag }}</th>
        <td>
          {{ field.errors }}
          {{ field }}
          {{ field.help_text }}
        </td>
      </tr>
    {% endfor %}
  </table>

  <button type="submit">Submit</button>
</form>

You will notice that the result is slightly different, but all the elements are there. The thing is, the automatic generation of the HTML just using the {{ form }} takes advantage of the Python language, so it can play with string concatenation, joining lists (non field errors + hidden field errors), and this sort of things. The template engine is more limited and restrict, but that’s not an issue. I like the Django Template engine because it doesn’t let you do much code logic in the template.

Contact Form With Errors

The only real issue is the random “This field is required” on the top, which refers to the source field. But we can improve that. Let’s keep expanding the form rendering, so we can even get more control over it:

<form method="post" novalidate>
  {% csrf_token %}

  {% if form.non_field_errors %}
    <ul>
      {% for error in form.non_field_errors %}
        <li>{{ error }}</li>
      {% endfor %}
    </ul>
  {% endif %}

  {% for hidden_field in form.hidden_fields %}
    {% if hidden_field.errors %}
      <ul>
        {% for error in hidden_field.errors %}
          <li>(Hidden field {{ hidden_field.name }}) {{ error }}</li>
        {% endfor %}
      </ul>
    {% endif %}
    {{ hidden_field }}
  {% endfor %}

  <table border="1">
    {% for field in form.visible_fields %}
      <tr>
        <th>{{ field.label_tag }}</th>
        <td>
          {% if field.errors %}
            <ul>
              {% for error in field.errors %}
                <li>{{ error }}</li>
              {% endfor %}
            </ul>
          {% endif %}
          {{ field }}
          {% if field.help_text %}
            <br />{{ field.help_text }}
          {% endif %}
        </td>
      </tr>
    {% endfor %}
  </table>

  <button type="submit">Submit</button>
</form>

Contact Form With Errors

Much closer right?

Now that we know how to “expand” the {{ form }} markup, let’s try to make it look prettier. Perhaps using the Bootstrap 4 library.


Accessing the Form Fields Individually

We don’t need a for loop to expose the form fields. But it’s a very convenient way to do it, specially if you don’t have any special requirements for the elements positioning.

Here is how we can refer to the form fields one by one:

<form method="post" novalidate>
  {% csrf_token %}

  {{ form.non_field_errors }}

  {{ form.source.errors }}
  {{ form.source }}

  <table border="1">

      <tr>
        <th>{{ form.name.label_tag }}</th>
        <td>
          {{ form.name.errors }}
          {{ form.name }}
        </td>
      </tr>

      <tr>
        <th>{{ form.email.label_tag }}</th>
        <td>
          {{ form.email.errors }}
          {{ form.email }}
        </td>
      </tr>

      <tr>
        <th>{{ form.message.label_tag }}</th>
        <td>
          {{ form.message.errors }}
          {{ form.message }}
          <br />
          {{ form.message.help_text }}
        </td>
      </tr>

  </table>

  <button type="submit">Submit</button>
</form>

It’s not a very DRY solution. But it’s good to know how to do it. Sometimes you may have a very specific use case that you will need to position the fields in the HTML by yourself.


Expanding the Form Fields

We can still dig deeper and expand the {{ field }} markup (or if you are doing it individually, it would be the {{ form.name }} or {{ form.email }} fields for example). But now things get a little bit more complex, because we are talking about the widgets. For example, the name field translates into a <input type="text"> tag, while the email field translates into a <input type="email"> tag, and even more problematic, the message field translates into a <textarea></textarea> tag.

At this point, Django makes use of small HTML templates to generate the output HTML of the fields.

So let’s see how Django does it. If we open the text.html or the email.html templates from the widgets folder, we will see it simply includes the input.html template file:

{% include "django/forms/widgets/input.html" %}

This suggests the input.html template is probably the most generic one, the specifics of the rendering might be inside it. So, let’s have a look:

<input type="{{ widget.type }}"
       name="{{ widget.name }}"
       {% if widget.value != None %} value="{{ widget.value|stringformat:'s' }}"{% endif %}
       {% include "django/forms/widgets/attrs.html" %} />

Basically this small template sets the input type, it’s name which is used to access the data in the request object. For example, an input with name “message”, if posted to the server, is accessible via request.POST['message'].

Still on the input.html template snippet, it also sets the current value of the field, or leave it empty if there is no data. It’s an important bit in the template, because that’s what keeps the state of the form after it’s submitted and wasn’t successfully processed (form was invalid).

Finally, it includes the attrs.html template, which is responsible for setting attributes such as maxlength, required, placeholder, style, or any other HTML attribute. It’s highly customizable in the form definition.

If you are curious about the attrs.html, here is what it looks like:

{% for name, value in widget.attrs.items %}
  {% if value is not False %}
    {{ name }}{% if value is not True %}="{{ value|stringformat:'s' }}"{% endif %}
  {% endif %}
{% endfor %}

Now, if you really want to create the inputs by yourself, you can do it like this (just the name field, for brevity):

<input type="text"
       name="name"
       id="id_name"
       {% if form.name.value != None %}value="{{ form.name.value|stringformat:'s' }}"{% endif %}
       maxlength="30"
       required>

Or a little bit better:

<input type="text"
       name="{{ form.name.name }}"
       id="{{ form.name.id_for_label }}"
       {% if form.name.value != None %}value="{{ form.name.value|stringformat:'s' }}"{% endif %}
       maxlength="{{ form.name.field.max_length }}"
       {% if form.name.field.required %}required{% endif %}>

Probably you already figured out that’s not the best way to work with forms. And maybe you are also asking yourself why sometimes we refer to a certain attribute as {{ form.name.<something> }} and in other situations we use {{ form.name.field.<something> }}.

I don’t want to go into much detail about it right now, but basically form.name is a BoundField (field + data) instance, and then, the form.name.field is the field definition, which is an instance of forms.CharField. That’s why some values are available in the bound field instance, and others are in the char field definition.

In any form definition, the form’s __iter__ returns a list of BoundField instances, in a similar way, the visible_fields() and hidden_fields() methods also return BoundField instances. Now, if you access the form.fields, it refers to a list of CharField, EmailField, and all other field definitions etc. If that’s too much information for you right now, it’s okay, you don’t have to bother about it right now.


Using Custom HTML Attributes

There are some cases that you only want to add an extra HTML attribute, like a class, a style, or a placeholder. You don’t need to expand the input field like we did in the previous example. You can do it directly in the form definition:

forms.py

class ColorfulContactForm(forms.Form):
    name = forms.CharField(
        max_length=30,
        widget=forms.TextInput(
            attrs={
                'style': 'border-color: blue;',
                'placeholder': 'Write your name here'
            }
        )
    )
    email = forms.EmailField(
        max_length=254,
        widget=forms.EmailInput(attrs={'style': 'border-color: green;'})
    )
    message = forms.CharField(
        max_length=2000,
        widget=forms.Textarea(attrs={'style': 'border-color: orange;'}),
        help_text='Write here your message!'
    )

Colorful Contact Form

Next, we are going to explore a third-party library that can make your life easier.


Using Django Widget Tweaks

Even though we can control the custom HTML attributes in the form definition, it would be much better if we could set them directly in the template. After all, the HTML attributes refer to the presentation of the inputs.

The django-widget-tweaks library is the right tool for the job. It let you keep the form defaults and just add what you need. It’s very convenient, specially when working with ModelForms, as it will reduce the amount of code you have to write to accomplish simple tasks.

I’m not going into much detail about the django-widget-tweaks because I have an article dedicated about it: How to use django-widget-tweaks.

Here’s a quick get started guide:

First, install it using pip:

pip install django-widget-tweaks

Add it to the INSTALLED_APPS:

INSTALLED_APPS = [
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',

    'widget_tweaks',
]

Load it in the template:

{% load widget_tweaks %}
<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>Simple is Better Than Complex</title>
</head>
<body>
  ...
</body>

And we are ready to use it! Basically we will use the template tag {% render_field %}. You will see in the next example that we can simply put the attributes just like we would do with raw HTML:

<form method="post" novalidate>
  {% csrf_token %}

  {{ form.non_field_errors }}

  {% for hidden_field in form.hidden_fields %}
    {{ hidden_field.errors }}
    {{ hidden_field }}
  {% endfor %}

  <table border="1">
    {% for field in form.visible_fields %}
      <tr>
        <th>{{ field.label_tag }}</th>
        <td>
          {{ field.errors }}
          {% render_field field style="border: 2px dashed red;" placeholder=field.name %}
          {{ field.help_text }}
        </td>
      </tr>
    {% endfor %}
  </table>

  <button type="submit">Submit</button>
</form>

Django Widget Tweaks Form

It’s very handy, specially for the cases where you just need to add a CSS class. Which is the case for using the Bootstrap 4 forms templates.


Rendering Bootstrap 4 Forms

Basically to use the Bootstrap 4 library I just included the CDN link they provide in my template:

<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-beta/css/bootstrap.min.css" integrity="sha384-/Y6pD6FV/Vv2HJnA6t+vslU6fwYXjCFtcEpHbNJ0lyAFsXTsjBbfaDjzALeQsN6M" crossorigin="anonymous">
  <title>Simple is Better Than Complex</title>
</head>

This part of the article will be more to-the-point, as I won’t explore the particularities of the Bootstrap 4 implementation. Their documentation is great and rich in examples. If you are not very familiar, you can jump to the Documentation / Components / Forms section for further information.

Let’s first focus on the presentation of the inputs, we will get to the errors part later. Here is how we can represent the same form using the Bootstrap 4 tags:

<form method="post" novalidate>
  {% csrf_token %}

  {% for hidden_field in form.hidden_fields %}
    {{ hidden_field }}
  {% endfor %}

  {% for field in form.visible_fields %}
    <div class="form-group">
      {{ field.label_tag }}
      {{ field }}
      {% if field.help_text %}
        <small class="form-text text-muted">{{ field.help_text }}</small>
      {% endif %}
    </div>
  {% endfor %}

  <button type="submit" class="btn btn-primary">Submit</button>
</form>

Bootstrap 4 Contact Form

The input fields looks broken though. That’s because the Bootstrap 4 forms expect a CSS class form-control in the HTML inputs. Let’s fix it with what we learned in the last section of this article:

{% load widget_tweaks %}

<form method="post" novalidate>
  {% csrf_token %}

  {% for hidden_field in form.hidden_fields %}
    {{ hidden_field }}
  {% endfor %}

  {% for field in form.visible_fields %}
    <div class="form-group">
      {{ field.label_tag }}
      {% render_field field class="form-control" %}
      {% if field.help_text %}
        <small class="form-text text-muted">{{ field.help_text }}</small>
      {% endif %}
    </div>
  {% endfor %}

  <button type="submit" class="btn btn-primary">Submit</button>
</form>

Bootstrap 4 Contact Form

Much better. Now let’s see the validation and errors situation. I’m going to use an alert component for the non field errors, and for the fields I will just play with the right CSS classes that Bootstrap 4 provides.

{% load widget_tweaks %}

<form method="post" novalidate>
  {% csrf_token %}

  {% for hidden_field in form.hidden_fields %}
    {{ hidden_field }}
  {% endfor %}

  {% if form.non_field_errors %}
    <div class="alert alert-danger" role="alert">
      {% for error in form.non_field_errors %}
        {{ error }}
      {% endfor %}
    </div>
  {% endif %}

  {% for field in form.visible_fields %}
    <div class="form-group">
      {{ field.label_tag }}

      {% if form.is_bound %}
        {% if field.errors %}
          {% render_field field class="form-control is-invalid" %}
          {% for error in field.errors %}
            <div class="invalid-feedback">
              {{ error }}
            </div>
          {% endfor %}
        {% else %}
          {% render_field field class="form-control is-valid" %}
        {% endif %}
      {% else %}
        {% render_field field class="form-control" %}
      {% endif %}

      {% if field.help_text %}
        <small class="form-text text-muted">{{ field.help_text }}</small>
      {% endif %}
    </div>
  {% endfor %}

  <button type="submit" class="btn btn-primary">Submit</button>
</form>

And here is the result:

Bootstrap 4 Contact Form

It’s very cool because it marks with green the fields that passed the validation:

Bootstrap 4 Contact Form

Let’s have a close look on what’s going on. We can improve the code snippet but I preferred to keep it that way so you can have a better idea about the template rendering logic.

First, I call the form.is_bound method. It tells us if the form have data or not. When we first initialize the form form = ContactForm(), the form.is_bound() method will return False. After a submission, the form.is_bound() will return True. So, we can play with it to know if the validation process already happened or not.

Then, when the validation already occurred, I’m simply marking the field with the CSS class .is-invalid and .is-valid, depending on the case. They are responsible for painting the form components in red or green.


Reusing Form Components

One thing we can do now, is copy the existing code to an external file, and reuse our code snippet for other forms.

includes/bs4_form.html

{% load widget_tweaks %}

{% for hidden_field in form.hidden_fields %}
  {{ hidden_field }}
{% endfor %}

{% if form.non_field_errors %}
  <div class="alert alert-danger" role="alert">
    {% for error in form.non_field_errors %}
      {{ error }}
    {% endfor %}
  </div>
{% endif %}

{% for field in form.visible_fields %}
  <div class="form-group">
    {{ field.label_tag }}

    {% if form.is_bound %}
      {% if field.errors %}
        {% render_field field class="form-control is-invalid" %}
        {% for error in field.errors %}
          <div class="invalid-feedback">
            {{ error }}
          </div>
        {% endfor %}
      {% else %}
        {% render_field field class="form-control is-valid" %}
      {% endif %}
    {% else %}
      {% render_field field class="form-control" %}
    {% endif %}

    {% if field.help_text %}
      <small class="form-text text-muted">{{ field.help_text }}</small>
    {% endif %}
  </div>
{% endfor %}

Then now, our form definition could be as simple as:

<form method="post" novalidate>
  {% csrf_token %}
  {% include 'includes/bs4_form.html' with form=form %}
  <button type="submit" class="btn btn-primary">Submit</button>
</form>

For example, using the code snippet above, we use it to process the UserCreationForm, which is a built-in form that lives inside the django.contrib.auth module. Below, the result:

Bootstrap 4 Contact Form


Conclusions

This article become bigger than I anticipated. I first thought about writing just a quick tutorial about form rendering. Then I remembered that I already had a to-the-point tutorial explaining how to use the django-widget-tweaks. So, instead I decided to dive deep into the details and explore some of the mechanics of the forms API.

I will have a follow-up article focusing on complex forms, rendering all together checkboxes, select fields, date picker and also about developing your own custom widgets.

I hope you learned something new or enjoying reading this article. If you may have any questions or want to discuss further about the topic, please leave a comment below!

As usual, you can find the source code and all the examples on GitHub.

August 18, 2017 09:00 PM


Catalin George Festila

The Google Cloud SDK - part 002 .

The next part of my tutorials about the Google Cloud SDK come with some infos about the project.
As you know I used the default sample appengine hello word standard application.
The goal is to understand how it works by working with Google's documentation and examples.
Into this project folder we have this files:

08/17/2017  11:12 PM                98 app.yaml
08/17/2017 11:12 PM 854 main.py
08/17/2017 11:12 PM 817 main_test.py
Let's see what these files contain:
First is app.yaml and come with:
runtime: python27
api_version: 1
threadsafe: true

handlers:
- url: /.*
script: main.app
The next is main.py file:
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import webapp2


class MainPage(webapp2.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/plain'
self.response.write('Hello, World!')


app = webapp2.WSGIApplication([
('/', MainPage),
], debug=True)
The last from this folder is main_test.py :
# Copyright 2016 Google Inc. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import webtest

import main


def test_get():
app = webtest.TestApp(main.app)

response = app.get('/')

assert response.status_int == 200
assert response.body == 'Hello, World!'
The app.yaml file is used to configure your App Engine application's settings of the project.
You can have many application-level configuration files (dispatch.yaml, cron.yaml, index.yaml, and queue.yaml).
This all type of configuration files are included in the top level app directory ( in this case : hello_world).
Let's see some common gcloud commands:
Let's test some changes:
First , change the text from main.py file with something else:
self.response.write('Hello, World!')
Now use this commands:
C:\Python27\python-docs-samples\appengine\standard\hello_world>gcloud app deploy
C:\Python27\python-docs-samples\appengine\standard\hello_world>gcloud app browse
The result is show into your browser.
You can read about this files into google documentation page - here.
Also some gcloud commands and reference you can read here.

August 18, 2017 02:13 PM


Codementor

Python Practices for Efficient Code: Performance, Memory, and Usability

Explore best practices to write Python code that executes faster, uses less memory, and looks more appealing.

August 18, 2017 10:51 AM

How I Learned Python Programming Language

Read about one person's perspective on learning to program using Python.

August 18, 2017 08:06 AM

August 17, 2017


Anwesha Das

DreamHost fighting to protect the fundamental rights of its users

Habeas data, my data my right, is the ethos of the right to be a free and fulfilling individual. It offers the individual to be him/herself without being monitored.

In The United States, there are several salvos to protect and further the concept.

The First Amendment

The First Amendment (Amendment I) to the United States Constitution establishes the

The Fourth Amendment

The Fourth Amendment(Amendment IV) to the United States Constitution

The Privacy Protection Act, 1980

The Act protects press, journalists, media house, newsroom from the search conducted by the government office bearers. It mandates that it shall be unlawful for a government employee to search for or seize “work product” or “documentary materials” that are possessed by a person “in connection with a purpose to disseminate to the public a newspaper, book, broadcast, or other similar form of public communication”, in connection with the investigation or prosecution of a criminal offense, [42 U.S.C. §§ 2000aa (a), (b) (1996)]. An order, a subpoena is necessary for accessing the information, documents.

But the Government for the time again have violated, disregarded these mandates and stepped outside their periphery in the name of security of the state.

The present situation with DreamHost

DreamHost is A Los Angeles based company(private). It provides the following services, Web hosting service, Cloud computing service, Cloud storage service, Domain name registrar. The company since past few months is fighting a legal battle to protect their and one of their customer’s, disruptj20.org fundamental right.

What is disruptj20.org?

The company hosts disruptj20.org in the web. It is a website which organized, encouraged willing individuals to participate against the present US Government. Wikipedia says - “DisruptJ20 (also Disrupt J20), a Washington, D.C.-based political organization founded in July 2016 and publicly launched on November 11 of the same year, stated its initial aim as protesting and disrupting events of the presidential inauguration of the 45th U.S.”

The Search Warrant

There was a Search Warrant issued against DreamHost. It requires them to
disclose, give away “the information associated with www.disruptj20.org that is stored at the premises owned, maintained, controlled, or operated by DreamHost,” [ATTACHMENT A].

The particular list of information to be disclosed and information to be seized by the government can be seen at ATTACHMENT B.

How it affects third parties (other than www.disruptj20.org)?

It demands to reveal to the government of “all files” related to the website, which includes the HTTP logs for the visitors, - means

Responding to it the company challenged the Department of Justice (on the warrant). They made an attempt to quash the demand of seizure and disclosure of the information by due legal process and reason.

Motion to show cause

In a usual course of action, the DOJ would respond to the inquiries of DreamHost. But here instead of answering to their inquiries, DOJ chose to file a motion to show cause in the Washington, D.C. Superior Court. DOJ asked for an order to compel them to produce the records,

The Opposition

The Opposition for the denial of the above mentioned motion filed by DreamHost filed an Opposition for the denial of the above mentioned motion. The “Argument” part shows/claims/demonstrates

“This motion is our latest salvo in what has become a months-long battle to protect the identities of thousands of unwitting internet users.”

Electronic Frontier Foundation has led their support, help to DreamHost, though they are not representing them in the court. The matter will be heard on August 18 in Washington, D.C.

There are different kinds of securities. Security for state power is a kind that is constantly protected. In contrast to the security for the population which is constantly denied, negated, curbed and restrained. By looking at the series of events, the documentary records of this particular incident raises a doubt -

The only security in this case is probably is being considered is security to stay in power, noticing the nature, subject of the website. Now it is the high time that people should stand to save individual’s, commoner’s right to have private space, opinion and protest. Kudous to DreamHost to protect the primary fundamental right of and individual - Privacy.

August 17, 2017 07:13 PM


Catalin George Festila

The Google Cloud SDK - part 001 .

This tutorial will cover this steps into development with Google Cloud SDK and Python version 2.7:


First you need to download the Google Cloud SDK and run it.


After GUI install a window command will ask you to set the default project for your work.
Welcome to the Google Cloud SDK! Run "gcloud -h" to get the list of available commands.
---
Welcome! This command will take you through the configuration of gcloud.

Your current configuration has been set to: [default]

You can skip diagnostics next time by using the following flag:
gcloud init --skip-diagnostics

Network diagnostic detects and fixes local network connection issues.
Checking network connection...done.
Reachability Check passed.
Network diagnostic (1/1 checks) passed.

You must log in to continue. Would you like to log in (Y/n)? Y
...
The next step is to start online with deploy a Hello World app with: Deploy a Hello World app:

This will start a online tutorial into the right area of screen with all commands and steps for your Google Cloud SDK online project.
Follow this steps and in the end will see how the online Google Cloud SDK project will show: Hello, World! into your browser .
The next step is to make a local project and run it.
You can use the python docs sample from GoogleCloudPlatform, but is not the same with the online example.
To download the GoogleCloudPlatform sample use git command:
C:\Python27>git clone https://github.com/GoogleCloudPlatform/python-docs-samples
Cloning into 'python-docs-samples'...
remote: Counting objects: 12126, done.
remote: Compressing objects: 100% (16/16), done.
remote: Total 12126 (delta 1), reused 10 (delta 1), pack-reused 12106
Receiving objects: 100% (12126/12126), 3.37 MiB | 359.00 KiB/s, done.
Resolving deltas: 100% (6408/6408), done.

C:\Python27>cd python-docs-samples/appengine/standard/hello_world
To start this sample into your google project you need to use this:
C:\Python27\python-docs-samples\appengine\standard\hello_world>gcloud app deploy app.yaml --project encoded-metrics-147522
Services to deploy:

descriptor: [C:\Python27\python-docs-samples\appengine\standard\hello_world\app.yaml]
source: [C:\Python27\python-docs-samples\appengine\standard\hello_world]
target project: [encoded-metrics-147522]
target service: [default]
target version: [20170817t234925]
target url: [https://encoded-metrics-147522.appspot.com]


Do you want to continue (Y/n)? Y

Beginning deployment of service [default]...
#============================================================#
#= Uploading 5 files to Google Cloud Storage =#
#============================================================#
File upload done.
Updating service [default]...done.
Waiting for operation [apps/encoded-metrics-147522/operations/XXXXXX] to complete...done.
Updating service [default]...done.
Deployed service [default] to [https://XXXXXX.appspot.com]

You can stream logs from the command line by running:
$ gcloud app logs tail -s default

To view your application in the web browser run:
$ gcloud app browse

C:\Python27\python-docs-samples\appengine\standard\hello_world>gcloud app browse
Opening [https://XXXXXX.appspot.com] in a new tab in your default browser.

C:\Python27\python-docs-samples\appengine\standard\hello_world>
This will start your application with trhe text - Hello, World! into your browser address bar with this web address: XXXXXX.appspot.com .



August 17, 2017 02:06 PM


Codementor

How to run a script as a background process?

A simple demonstration on how to run a script as a background process in a Debian environment.

August 17, 2017 08:42 AM


Python Bytes

#39 The new PyPI

<p><strong>Mahmoud #1:</strong> <a href="https://pypi.org/"><strong>The New PyPI</strong></a></p> <ul> <li>Donald Stufft and his PyPA team have been hard at work replacing the old pypi.python.org</li> <li>The new site is now handling almost all the old functionality (excepting deprecated features, of course): <a href="https://pypi.org/">https://pypi.org/</a></li> <li>The new site has handled downloads (presently exceeding 1PB monthly bandwidth) for a while now, and uploads as of recently.</li> <li>A nice full-fledged, open-source Python application, eagerly awaiting your review and contribution: <a href="https://github.com/pypa/warehouse/">https://github.com/pypa/warehouse/</a></li> <li>More updates at: <a href="https://mail.python.org/pipermail/distutils-sig/">https://mail.python.org/pipermail/distutils-sig/</a></li> </ul> <p><strong>Brian #2:</strong> <a href="http://makezine.com/2017/08/11/circuitpython-snakes-way-adafruit-hardware/"><strong>CircuitPython Snakes its Way onto Adafruit Hardware</strong></a></p> <ul> <li><a href="https://blog.adafruit.com/2017/01/09/welcome-to-the-adafruit-circuitpython-beta/">Adafruit announced CircuitPython in January</a> <ul> <li>“CircuitPython is based on the <a href="https://github.com/micropython/micropython">open-source</a> <a href="https://micropython.org/">MicroPython</a> which brings the popular Python language to microcontrollers. The goal of CircuitPython is to make hardware as simple and easy as possible.”</li> <li>Already runs on <a href="https://www.adafruit.com/product/3505">Metro M0 Express</a>, <a href="https://www.adafruit.com/product/3403">Feather M0 Express</a>, and they are working on support for <a href="https://www.adafruit.com/product/3333">Circuit Playground Express</a>, and now Gemma M0</li> </ul></li> <li>New product is <a href="https://www.adafruit.com/product/3501">Gemma M0</a>: <ul> <li><a href="https://blog.adafruit.com/2017/07/27/new-product-adafruit-gemma-m0-miniature-wearable-electronic-platform/">Announced</a> at the end of July.</li> <li>It’s about the size of a quarter and is considered a wearable computer.</li> <li>“When you plug it in, it will show up as a very small disk drive with <strong>main.py</strong> on it. Edit <strong>main.py</strong> with your favorite text editor to build your project using Python, the most popular programming language. No installs, IDE or compiler needed, so you can use it on any computer, even ChromeBooks or computers you can’t install software on. When you’re done, unplug the Gemma M0 and your code will go with you."</li> <li>They’re under $10. I gotta get one of these and play with it. (Anyone from Adafruit listening, want to send me one?)</li> <li>Here's the intro video for it: <a href="https://www.youtube.com/watch?v=nRE_cryQJ5c&amp;feature=youtu.be">https://www.youtube.com/watch?v=nRE_cryQJ5c&amp;feature=youtu.be</a></li> </ul></li> <li><a href="https://learn.adafruit.com/creating-and-sharing-a-circuitpython-library">Creating and sharing a CircuitPython Library</a> is a good introduction to the Python open source community, including: <ul> <li>Creating a library (package or module)</li> <li>Sharing on GitHub</li> <li>Sharing docs on ReadTheDocs</li> <li>Testing with Travis CI</li> <li>Releasing on GitHub</li> </ul></li> </ul> <p><strong>Mahmoud #3:</strong> <strong>Dataclasses</strong></p> <ul> <li>Python has had classes for a long time, but maybe it’s time for some updated syntax and semantics, something higher level perhaps?</li> <li>dataclasses is an interesting case of Python’s core dev doing their own take on community innovation (Hynek’s attrs: https://attrs.org)</li> <li>Code, issues, and draft PEP at https://github.com/ericvsmith/dataclasses</li> </ul> <p><strong>Brian #4:</strong> <a href="http://kanoki.org/2017/07/16/pandas-in-a-nutshell/"><strong>Pandas in a Nutshell</strong></a></p> <ul> <li>Jupyter Notebook style post. Tutorial by example with just a bit of extra text for explanation.</li> <li>Data structures: <ul> <li>Series – it’s a one dimensional array with indexes, it stores a single column or row of data in a Dataframe</li> <li>Dataframe – it’s a tabular spreadsheet like structure representing rows each of which contains one or multiple columns</li> </ul></li> <li>Series: Custom indices, adding two series, naming series, …</li> <li>Dataframes: using .head() and .tail(), info(), adding columns, adding a column as a calculation of another column, deleting a column, creating a dataframe from a dictionary, reindexing, summing columns and rows, .describe() for simple statistics, corr() for correlations, dealing with missing values, dropping rows, selecting, sorting, multi-indexing, grouping, </li> </ul> <p><strong>Mahmoud</strong> <strong>#5:</strong> <strong>Static Typing</strong></p> <ul> <li>PyBay 2017, which ended a day before recording, featured a neat panel on static typing in Python.</li> <li>One member each from Google, Quora, PyCharm, Facebook, and University of California</li> <li>Three different static analysis tools (four, if you count PyLint)</li> <li>They’re all collaborating already, and open to much more, as we can see on this collection of the stdlib’s type defs: <a href="https://github.com/python/typeshed">https://github.com/python/typeshed</a></li> <li>A fair degree of consensus around static types being most useful for testable documentation, like doctests, but with more systemic implications</li> <li>Not intended to be an algebraic type system (like Haskell, etc.)</li> </ul> <p><strong>Brian</strong> <strong>#6:</strong> <a href="https://www.fullstackpython.com/object-relational-mappers-orms.html"><strong>Full Stack Python Explains ORMs</strong></a></p> <ul> <li>What are Object Relational Mappers? <ul> <li>“An object-relational mapper (ORM) is a code library that automates the transfer of data stored in relational databases tables into objects that are more commonly used in application code.”</li> </ul></li> <li>Why are they useful? <ul> <li>“ORMs provide a high-level abstraction upon a relational database that allows a developer to write Python code instead of SQL to create, read, update and delete data and schemas in their database.”</li> </ul></li> <li>Do you need to use them?</li> <li>Downsides to ORMs: <ul> <li>Impedance mismatch : “the way a developer uses objects is different from how data is stored and joined in relational tables”</li> <li>Potential for reduced performance: code in the middle isn’t free</li> <li>Shifting complexity from the database into the application code : people usually don’t use database stored procedures when working with ORMs.</li> </ul></li> <li>A handful of popular ones including Django ORM, SQLAlchemy, Peewee, Pony, and SQLObject. Mostly listed as pointing out that they are active projects, brief description, and links for more info.</li> <li>Matt also has a <a href="https://www.fullstackpython.com/sqlalchemy.html">SQLAlchemy page</a> and a <a href="https://www.fullstackpython.com/peewee.html">peewee page</a> for more info on them.</li> </ul> <p><strong>Extra Mahmoud:</strong></p> <ul> <li><a href="https://github.com/python-hyper/hyperlink">hyperlink</a></li> <li><a href="https://riot.im">riot.im</a> + <a href="https://riot.im"></a><a href="https://github.com/matrix-org/synapse">(server code in Python)</a></li> </ul> <p><strong>Extra Brian:</strong></p> <ul> <li><a href="https://pragprog.com/book/bopytest/python-testing-with-pytest">Python Testing with pytest</a> has a <a href="https://forums.pragprog.com/forums/438">Discussion Forum</a>. It’s something that I think all Pragmatic books have. Just this morning I answered a question about the difference between monkeypatch and mock and when you would use one over the other.</li> </ul>

August 17, 2017 08:00 AM


Duncan McGreggor

NASA/EOSDIS Earthdata

Update

It's been a few years since I posted on this blog -- most of the technical content I've been contributing to in the past couple years has been in the following:
But since the publication of the Mastering matplotlib book, I've gotten more and more into satellite data. The book, it goes without saying, focused on Python for the analysis and interpretation of satellite data (in one of the many topics covered). After that I spent some time working with satellite and GIS data in general using Erlang and LFE. Ultimately though, I found that more and more projects were using the JVM for this sort of work, and in particular, I noted that Clojure had begun to show up in a surprising number of Github projects.

EOSDIS

Enter NASA's Earth Observing System Data and Information System (see also earthdata.nasa.gov and EOSDIS on Wikipedia), a key part of the agency's Earth Science Data Systems Program. It's essentially a concerted effort to bring together the mind-blowing amounts of earth-related data being collected throughout, around, and above the world so that scientists may easily access and correlate earth science data for their research.

Related NASA projects include the following:
The acronym menagerie can be bewildering, but digging into the various NASA projects is ultimately quite rewarding (greater insights, previously unknown resources, amazing research, etc.).

Clojure

Back to the Clojure reference I made above:  I've been contributing to the nasa/Common-Metadata-Repository open source project (hosted on Github) for a few months now, and it's been amazing to see how all this data from so many different sources gets added, indexed, updated, and generally made so much more available to any who want to work with it. The private sector always seems to be so far ahead of large projects in terms of tech and continuously improving updates to existing software, so its been pretty cool to see a large open source project in the NASA Github org make so many changes that find ways to keep helping their users do better research. More so that users are regularly delivered new features in a large, complex collection of libraries and services thanks in part to the benefits that come from using a functional programming language.

It may seem like nothing to you, but the fact that there are now directory pages for various data providers (e.g., GES_DISC, i.e., Goddard Earth Sciences Data and Information Services Center) makes a big difference for users of this data. The data provider pages now also offer easy access to collection links such as UARS Solar Ultraviolet Spectral Irradiance Monitor. Admittedly, the directory pages still take a while to load, but there are improvements on the way for page load times and other related tasks. If you're reading this a month after this post was written, there's a good chance it's already been fixed by now.

Summary

In summary, it's been a fun personal journey from looking at Landsat data for writing a book to working with open source projects that really help scientists to do their jobs better :-) And while I have enjoyed using the other programming languages to explore this problem space, Clojure in particular has been a delightfully powerful tool for delivering new features to the science community.

August 17, 2017 07:05 AM

August 16, 2017


Continuum Analytics News

Continuum Analytics to Share Insights at JupyterCon 2017

Thursday, August 17, 2017

Presentation topics include Jupyter and Anaconda in the enterprise; open innovation in a data-centric world; building an Excel-Python bridge; encapsulating data science using Anaconda Project and JupyterLab; deploying Jupyter dashboards for datapoints; JupyterLab

NEW YORK, August 17, 2017—Continuum Analytics, the creator and driving force behind Anaconda, the leading Python data science platform, today announced that the team will present one keynote, three talks and two tutorials at JupyterCon on August 23 and 24 in NYC, NY. The event is designed for the data science and business analyst community and offers in-depth trainings, insightful keynotes, networking events and talks exploring the Project Jupyter platform.

Peter Wang, co-founder and CTO of Continuum Analytics, will present two sessions on August 24. The first is a keynote at 9:15 am, titled “Jupyter & Anaconda: Shaking Up the Enterprise.” Peter will discuss the co-evolution of these two major players in the new open source data science ecosystem and next steps to a sustainable future. The other is a talk, “Fueling Open Innovation in a Data-Centric World,” at 11:55 am, offering Peter’s perspectives on the unique challenges of building a company that is fundamentally centered around sustainable open source innovation.

The second talk features Christine Doig, senior data scientist, product manager, and Fabio Pliger, software engineer, of Continuum Analytics, “Leveraging Jupyter to build an Excel-Python Bridge.” It will take place on August 24 at 11:05 am and Christine and Fabio will share how they created a native Microsoft Excel plug-in that provides a point-and-click interface to Python functions, enabling Excel analysts to use machine learning models, advanced interactive visualizations and distributed compute frameworks without needing to write any code. Christine will also be holding a talk on August 25 at 11:55 am on “Data Science Encapsulation and Deployment with Anaconda Project & JupyterLab.” Christine will share how Anaconda Project and JupyterLab encapsulate data science and how to deploy self-service notebooks, interactive applications, dashboards and machine learning.

James Bednar, senior solutions architect, and Philipp Rudiger, software developer, of Continuum Analytics, will give a tutorial on August 23 at 1:30 pm titled, “Deploying Interactive Jupyter Dashboards for Visualizing Hundreds of Millions of Datapoints.” This tutorial will explore an overall workflow for building interactive dashboards, visualizing billions of data points interactively in a Jupyter notebook, with graphical widgets allowing control over data selection, filtering and display options, all using only a few dozen lines of code.

The second tutorial, “JupyterLab,” will be hosted by Steven Silvester, software engineer at Continuum Analytics and Jason Grout, software developer at Bloomberg, on August 23 at 1:30 pm. They will walk through JupyterLab as a user and as an extension author, exploring its capabilities and offering a demonstration on how to create a simple extension to the environment.

Keynote:
WHO: Peter Wang, co-founder and CTO, Anaconda Powered by Continuum Analytics
WHAT: Jupyter & Anaconda: Shaking Up the Enterprise
WHEN: August 24, 9:15am-9:25am ET
WHERE: Grand Ballroom

Talk #1:
WHO: Peter Wang, co-founder and CTO, Anaconda Powered by Continuum Analytics
WHAT: Fueling Open Innovation in a Data-Centric World
WHEN: August 24, 11:55am–12:35pm ET
WHERE: Regent Parlor

Talk #2:
WHO: 

  • Christine Doig, senior data scientist, product manager, Anaconda Powered by Continuum Analytics
  • Fabio Pliger, software engineer, Anaconda Powered by Continuum Analytics

WHAT: Leveraging Jupyter to Build an Excel-Python Bridge
WHEN: August 24, 11:05am–11:45am ET
WHERE: Murray Hill

Talk #3:
WHO: Christine Doig, senior data scientist, product manager, Anaconda Powered by Continuum Analytics
WHAT: Data Science Encapsulation and Deployment with Anaconda Project & JupyterLab
WHEN: August 25, 11:55am–12:35pm ET
WHERE: Regent Parlor

Tutorial #1:
WHO: 

  • James Bednar, senior solutions architect, Anaconda Powered By Continuum Analytics 
  • Philipp Rudiger, software developer, Anaconda Powered By Continuum Analytics 

WHAT: Deploying Interactive Jupyter Dashboards for Visualizing Hundreds of Millions of Datapoints
WHEN: August 23, 1:30pm–5:00pm ET
WHERE: Concourse E

Tutorial #2:
WHO: 

  • Steven Silvester, software engineer, Anaconda Powered By Continuum Analytics 
  • Jason Grout, software developer of Bloomberg

WHAT: JupyterLab Tutorial
WHEN: August 23, 1:30pm–5:00pm ET
WHERE: Concourse A

###

About Anaconda Powered by Continuum Analytics
Anaconda is the leading data science platform powered by Python, the fastest growing data science language with more than 30 million downloads to date. Continuum Analytics is the creator and driving force behind Anaconda, empowering leading businesses across industries worldwide with solutions to identify patterns in data, uncover key insights and transform data into a goldmine of intelligence to solve the world’s most challenging problems. Anaconda puts superpowers into the hands of people who are changing the world. Learn more at continuum.io

###

Media Contact:
Jill Rosenthal
InkHouse
anaconda@inkhouse.com

 

August 16, 2017 03:12 PM


Eli Bendersky

Right and left folds, primitive recursion patterns in Python and Haskell

A "fold" is a fundamental primitive in defining operations on data structures; it's particularly important in functional languages where recursion is the default tool to express repetition. In this article I'll present how left and right folds work and how they map to some fundamental recursive patterns.

The article starts with Python, which should be (or at least look) familiar to most programmers. It then switches to Haskell for a discussion of more advanced topics like the connection between folding and laziness, as well as monoids.

Extracting a fundamental recursive pattern

Let's begin by defining a couple of straightforward functions in a recursive manner, in Python. First, computing the product of all the numbers in a given list:

def product(seq):
    if not seq:
        return 1
    else:
        return seq[0] * product(seq[1:])

Needless to say, we wouldn't really write this function recursively in Python; but if we were, this is probably how we'd write it.

Now another, slightly different, function. How do we double (multiply by 2) every element in a list, recursively?

def double(seq):
    if not seq:
        return []
    else:
        return [seq[0] * 2] + double(seq[1:])

Again, ignoring the fact that Python has much better ways to do this (list comprehensions, for example), this is a straightforward recursive pattern that experienced programmers can produce in their sleep.

In fact, there's a lot in common between these two implementation. Let's try to find the commonalities.

Recursion pattern in the product function

As this diagram shows, the functions product and double are really only different in three places:

  1. The initial value produced when the input sequence is empty.
  2. The mapping applied to every sequence value processed.
  3. The combination of the mapped sequence value with the rest of the sequence.

For product:

  1. The initial value is 1.
  2. The mapping is identity (each sequence element just keeps its value, without change).
  3. The combination is the multiplication operator.

Can you figure out the same classification for double? Take a few moments to try for yourself. Here it is:

  1. The initial value is the empty list [].
  2. The mapping takes a value, multiplies it by 2 and puts it into a list. We could express this in Python as lambda x: [x * 2].
  3. The combination is the list concatenation operator +.

With the diagram above and these examples, it's straightforward to write a generalized "recursive transform" function that can be used to implement both product and double:

def transform(init, mapping, combination, seq):
    if not seq:
        return init
    else:
        return combination(mapping(seq[0]),
                           transform(init, mapping, combination, seq[1:]))

The transform function is parameterized with init - the initial value, mapping- a mapping function applied to every sequence value, and combination - the combination of the mapped sequence value with the rest of the sequence. With these given, it implements the actual recursive traversal of the list.

Here's how we'd write product in terms of transform:

def product_with_transform(seq):
    return transform(1, lambda x: x, lambda a, b: a * b, seq)

And double:

def double_with_transform(seq):
    return transform([], lambda x: [x * 2], lambda a, b: a + b, seq)

foldr - fold right

Generalizations like transform make functional programming fun and powerful, since they let us express complex ideas with the help of relatively few building blocks. Let's take this idea further, by generalizing transform even more. The main insight guiding us is that the mapping and the combination don't even have to be separate functions. A single function can play both roles.

In the definition of transform, combination is applied to:

  1. The result of calling mapping on the current sequence value.
  2. The recursive application of the transformation to the rest of the sequence.

We can encapsulate both in a function we call the "reduction function". This reduction function takes two arguments: the current sequence value (item), and the result of applying the full transfromation to the rest of the sequence. The driving transformation that uses this reduction function is called "a right fold" (or foldr):

def foldr(func, init, seq):
    if not seq:
        return init
    else:
        return func(seq[0], foldr(func, init, seq[1:]))

We'll get to why this is called "fold" shortly; first, let's convince ourselves it really works. Here's product implemented using foldr:

def product_with_foldr(seq):
    return foldr(lambda seqval, acc: seqval * acc, 1, seq)

The key here is the func argument. In the case of product, it "reduces" the current sequence value with the "accumulator" (the result of the overall transformation invoked on the rest of the sequence) by multiplying them together. The overall result is a product of all the elements in the list.

Let's trace the calls to see the recursion pattern. I'll be using the tracing technique described in this post. For this purpose I hoisted the reducing function into a standalone function called product_reducer:

def product_reducer(seqval, acc):
    return seqval * acc

def product_with_foldr(seq):
    return foldr(product_reducer, 1, seq)

The full code for this experiment is available here. Here's the tracing of invoking product_with_foldr([2, 4, 6, 8]):

product_with_foldr([2, 4, 6, 8])
  foldr(<function product_reducer at 0x7f3415145ae8>, 1, [2, 4, 6, 8])
    foldr(<function product_reducer at 0x7f3415145ae8>, 1, [4, 6, 8])
      foldr(<function product_reducer at 0x7f3415145ae8>, 1, [6, 8])
        foldr(<function product_reducer at 0x7f3415145ae8>, 1, [8])
          foldr(<function product_reducer at 0x7f3415145ae8>, 1, [])
          --> 1
          product_reducer(8, 1)
          --> 8
        --> 8
        product_reducer(6, 8)
        --> 48
      --> 48
      product_reducer(4, 48)
      --> 192
    --> 192
    product_reducer(2, 192)
    --> 384
  --> 384

The recursion first builds a full stack of calls for every element in the sequence, until the base case (empty list) is reached. Then the calls to product_reducer start executing. The first reduces 8 (the last element in the list) with 1 (the result of the base case). The second reduces this result with 6 (the second-to-last element in the list), and so on until we reach the final result.

Since foldr is just a generic traversal pattern, we can say that the real work here happens in the reducers. If we build a tree of invocations of product_reducer, we get:

foldr mul tree

And this is why it's called the right fold. It takes the rightmost element and combines it with init. Then it takes the result and combines it with the second rightmost element, and so on until the first element is reached.

More general operations with foldr

We've seen how foldr can implement all kinds of functions on lists by encapsulating a fundamental recursive pattern. Let's see a couple more examples. The function double shown above is just a special case of the functional map primitive:

def map(mapf, seq):
    if not seq:
        return []
    else:
        return [mapf(seq[0])] + map(mapf, seq[1:])

Instead of applying a hardcoded "multiply by 2" function to each element in the sequence, map applies a user-provided unary function. Here's map implemented in terms of foldr:

def map_with_foldr(mapf, seq):
    return foldr(lambda seqval, acc: [mapf(seqval)] + acc, [], seq)

Another functional primitive that we can implement with foldr is filter. This one is just a bit trickier because we sometimes want to "skip" a value based on what the filtering predicate returns:

def filter(predicate, seq):
    if not seq:
        return []
    else:
        maybeitem = [seq[0]] if predicate(seq[0]) else []
        return maybeitem + filter(predicate, seq[1:])

Feel free to try to rewrite it with foldr as an exercise before looking at the code below. We just follow the same pattern:

def filter_with_foldr(predicate, seq):
    def reducer(seqval, acc):
        if predicate(seqval):
            return [seqval] + acc
        else:
            return acc
    return foldr(reducer, [], seq)

We can also represent less "linear" operations with foldr. For example, here's a function to reverse a sequence:

def reverse_with_foldr(seq):
    return foldr(lambda seqval, acc: acc + [seqval], [], seq)

Note how similar it is to map_with_foldr; only the order of concatenation is flipped.

Left-associative operations and foldl

Let's probe at some of the apparent limitations of foldr. We've seen how it can be used to easily compute the product of numbers in a sequence. What about a ratio? For the list [3, 2, 2] the ratio is "3 divided by 2, divided by 2", or 0.75 [1].

If we take product_with_foldr from above and replace * by /, we get:

>>> foldr(lambda seqval, acc: seqval / acc, 1, [3, 2, 2])
3.0

What gives? The problem here is the associativity of the operator /. Take another look at the call tree diagram shown above. It's obvious this diagram represents a right-associative evaluation. In other words, what our attempt at a ratio did is compute 3 / (2 / 2), which is indeed 3.0; instead, what we'd like is (3 / 2) / 2. But foldr is fundamentally folding the expression from the right. This works well for associative operations like + or * (operations that don't care about the order in which they are applied to a sequence), and also for right-associative operations like exponentiation, but it doesn't work that well for left-associative operations like / or -.

This is where the left fold comes in. It does precisely what you'd expect - folds a sequence from the left, rather than from the right. I'm going to leave the division operation for later [2] and use another example of a left-associative operation: converting a sequence of digits into a number. For example [2, 3] represents 23, [3, 4, 5, 6] represents 3456, etc. (a related problem which is more common in introductory programming is converting a string that contains a number into an integer).

The basic reducing operation we'll use here is: acc * 10 + sequence value. To get 3456 from [3, 4, 5, 6] we'll compute:

(((((3 * 10) + 4) * 10) + 5) * 10) + 6

Note how this operation is left-associative. Reorganizing the parens to a rightmost-first evaluation would give us a completely different result.

Without further ado, here's the left fold:

def foldl(func, init, seq):
    if not seq:
        return init
    else:
        return foldl(func, func(init, seq[0]), seq[1:])

Note that the order of calls between the recursive call to itself and the call to func is reversed vs. foldr. This is also why it's customary to put acc first and seqval second in the reducing functions passed to foldl.

If we perform multiplication with foldl:

def product_with_foldl(seq):
    return foldl(lambda acc, seqval: acc * seqval, 1, seq)

We'll get this trace:

product_with_foldl([2, 4, 6, 8])
  foldl(<function product_reducer at 0x7f2924cbdc80>, 1, [2, 4, 6, 8])
    product_reducer(1, 2)
    --> 2
    foldl(<function product_reducer at 0x7f2924cbdc80>, 2, [4, 6, 8])
      product_reducer(2, 4)
      --> 8
      foldl(<function product_reducer at 0x7f2924cbdc80>, 8, [6, 8])
        product_reducer(8, 6)
        --> 48
        foldl(<function product_reducer at 0x7f2924cbdc80>, 48, [8])
          product_reducer(48, 8)
          --> 384
          foldl(<function product_reducer at 0x7f2924cbdc80>, 384, [])
          --> 384
        --> 384
      --> 384
    --> 384
  --> 384

Contrary to the right fold, the reduction function here is called immediately for each recursive step, rather than waiting for the recursion to reach the end of the sequence first. Let's draw the call graph to make the folding-from-the-left obvious:

foldl mul tree

Now, to implement the digits-to-a-number function task described earlier, we'll write:

def digits2num_with_foldl(seq):
    return foldl(lambda acc, seqval: acc * 10 + seqval, 0, seq)

Stepping it up a notch - function composition with foldr

Since we're looking at functional programming primitives, it's only natural to consider how to put higher order functions to more use in combination with folds. Let's see how to express function composition; the input is a sequence of unary functions: [f, g, h] and the output is a single function that implements f(g(h(...))). Note this operation is right-associative, so it's a natural candidate for foldr:

identity = lambda x: x

def fcompose_with_foldr(fseq):
    return foldr(lambda seqval, acc: lambda x: seqval(acc(x)), identity, fseq)

In this case seqval and acc are both functions. Each step in the fold consumes a new function from the sequence and composes it on top of the accumulator (which is the function composed so far). The initial value for this fold has to be the identity for the composition operation, which just happens to be the identity function.

>>> f = fcompose_with_foldr([lambda x: x+1, lambda x: x*7, lambda x: -x])
>>> f(8)
-55

Let's take this trick one step farther. Recall how I said foldr is limited to right-associative operations? Well, I lied a little. While it's true that the fundamental recursive pattern expressed by foldr is right-associative, we can use the function composition trick to evaluate some operation on a sequence in a left-associative way. Here's the digits-to-a-number function with foldr:

def digits2num_with_foldr(seq):
    composed = foldr(
                lambda seqval, acc: lambda n: acc(n * 10 + seqval),
                identity,
                seq)
    return composed(0)

To understand what's going on, manually trace the invocation of this function on some simple sequence like [1, 2, 3]. The key to this approach is to recall that foldr gets to the end of the list before it actually starts applying the function it folds. The following is a careful trace of what happens, with the folded function replaced by g for clarify.

digits2num_with_foldl([1, 2, 3])
-> foldr(g, identity, [1, 2, 3])
-> g(1, foldr(g, identity, [2, 3]))
-> g(1, g(2, foldr(g, identity, [3])))
-> g(1, g(2, g(3, foldr(g, identity, []))))
-> g(1, g(2, g(3, identity)))
-> g(1, g(2, lambda n: identity(n * 10 + 3)))

Now things become a bit trickier to track because of the different anonymous functions and their bound variables. It helps to give these function names.

<f1 = lambda n: identity(n * 10 + 3)>
-> g(1, g(2, f1))
-> g(1, lambda n: f1(n * 10 + 2))
<f2 = lambda n: f1(n * 10 + 2)>
-> g(1, f2)
-> lambda n: f2(n * 10 + 1)

Finally, we invoke this returned function on 0:

f2(0 * 10 + 1)
-> f1(1 * 10 + 2)
-> identity(12 * 10 + 3)
-> 123

In other words, the actual computation passed to that final identity is:

((1 * 10) + 2) * 10 + 3

Which is the left-associative application of the folded function.

Expressing foldl with foldr

After the last example, it's not very surprising that we can take this approach to its logical conclusion and express the general foldl by using foldr. It's just a generalization of digits2num_with_foldr:

def foldl_with_foldr(func, init, seq):
    composed = foldr(
                lambda seqval, acc: lambda n: acc(func(n, seqval)),
                identity,
                seq)
    return composed(init)

In fact, the pattern expressed by foldr is very close to what is called primitive recursion by Stephen Kleene in his 1952 book Introduction to Metamathematics. In other words, foldr can be used to express a wide range of recursive patterns. I won't get into the theory here, but Graham Hutton's article A tutorial on the universality and expressiveness of fold is a good read.

foldr and foldl in Haskell

Now I'll switch gears a bit and talk about Haskell. Writing transformations with folds is not really Pythonic, but it's very much the default Haskell style. In Haskell recursion is the way to iterate.

Haskell is a lazily evaluated language, which makes the discussion of folds a bit more interesting. While this behavior isn't hard to emulate in Python, the Haskell code dealing with folds on lazy sequences is pleasantly concise and clear.

Let's starts by implementing product and double - the functions this article started with. Here's the function computing a product of a sequence of numbers:

myproduct [] = 1
myproduct (x:xs) = x * myproduct xs

And a sample invocation:

*Main> myproduct [2,4,6,8]
384

The function doubling every element in a sequence:

mydouble [] = []
mydouble (x:xs) = [2 * x] ++ mydouble xs

Sample invocation:

*Main> mydouble [2,4,6,8]
[4,8,12,16]

IMHO, the Haskell variants of these functions make it very obvious that a right-fold recursive pattern is in play. The pattern matching idiom of (x:xs) on sequences splits the "head" from the "tail" of the sequence, and the combining function is applied between the head and the result of the transformation on the tail. Here's foldr in Haskell, with a type declaration that should help clarify what goes where:

myfoldr :: (b -> a -> a) -> a -> [b] -> a
myfoldr _ z [] = z
myfoldr f z (x:xs) = f x (myfoldr f z xs)

If you're not familiar with Haskell this code may look foreign, but it's really a one-to-one mapping of the Python code for foldr, using some Haskell idioms like pattern matching.

These are the product and doubling functions implemented with myfoldr, using currying to avoid specifying the last parameter:

myproductWithFoldr = myfoldr (*) 1

mydoubleWithFoldr = myfoldr (\x acc -> [2 * x] ++ acc) []

Haskell also has a built-in foldl which performs the left fold. Here's how we could write our own:

myfoldl :: (a -> b -> a) -> a -> [b] -> a
myfoldl _ z [] = z
myfoldl f z (x:xs) = myfoldl f (f z x) xs

And this is how we'd write the left-associative function to convert a sequence of digits into a number using this left fold:

digitsToNumWithFoldl = myfoldl (\acc x -> acc * 10 + x) 0

Folds, laziness and infinite lists

Haskell evaluates all expressions lazily by default, which can be either a blessing or a curse for folds, depending on what we need to do exactly. Let's start by looking at the cool applications of laziness with foldr.

Given infinite lists (yes, Haskell easily supports infinite lists because of laziness), it's fairly easy to run short-circuiting algorithms on them with foldr. By short-circuiting I mean an algorithm that terminates the recursion at some point throughout the list, based on a condition.

As a silly but educational example, consider doubling every element in a sequence but only until a 5 is encountered, at which point we stop:

> foldr (\x acc -> if x == 5 then [] else [2 * x] ++ acc) [] [1,2,3,4,5,6,7]
[2,4,6,8]

Now let's try the same on an infinite list:

> foldr (\x acc -> if x == 5 then [] else [2 * x] ++ acc) [] [1..]
[2,4,6,8]

It terminates and returns the right answer! Even though our earlier stack trace of folding makes it appear like we iterate all the way to the end of the input list, this is not the case for our folding function. Since the folding function doesn't use acc when x == 5, Haskell won't evaluate the recursive fold further [3].

The same trick will not work with foldl, since foldl is not lazy in its second argument. Because of this, Haskell programmers are usually pointed to foldl', the eager version of foldl, as the better option. foldl' evaluates its arguments eagerly, meaning that:

  1. It won't support infinite sequences (but neither does foldl!)
  2. It's significantly more efficient than foldl because it can be easily turned into a loop (note that the recursion in foldl is a tail call, and the eager foldl' doesn't have to build a thunk of increasing size due to laziness in the first argument).

There is also an eager version of the right fold - foldr', which can be more efficient than foldr in some cases; it's not in Prelude but can be imported from Data.Foldable [4].

Folding vs. reducing

Our earlier discussion of folds may have reminded you of the reduce built-in function, which seems to be doing something similar. In fact, Python's reduce implements the left fold where the first element in the sequence is used as the zero value. One nice property of reduce is that it doesn't require an explicit zero value (though it does support it via an optional parameter - this can be useful when the sequence is empty, for example).

Haskell has its own variations of folds that implement reduce - they have the digit 1 as suffix: foldl1 is the more direct equivalent of Python's reduce - it doesn't need an initializer and folds the sequence from the left. foldr1 is similar, but folds from the right. Both have eager variants: foldl1' and foldr1'.

I promised to revisit calculating the ratio of a sequence; here's a way, in Haskell:

myratioWithFoldl = foldl1 (/)

The problem with using a regular foldl is that there's no natural identity value to use on the leftmost side of a ratio (on the rightmost side 1 works, but the associativity is wrong). This is not an issue for foldl1, which starts the recursion with the first item in the sequence, rather than an explicit initial value.

*Main> myratioWithFoldl [3,2,2]
0.75

Note that foldl1 will throw an exception if the given sequence is empty, since it needs at least one item in there.

Folding arbitrary data structures

The built-in folds in Haskell are defined on lists. However, lists are not the only data structure we should be able to fold. Why can't we fold maps (say, summing up all the keys), or even custom data structures? What is the minimum amount of abstraction we can extract to enable folding?

Let's start by defining a simple binary tree data structure:

data Tree a = Empty | Leaf a | Node a (Tree a) (Tree a)
  deriving Show

-- A sample tree with a few nodes
t1 = Node 10 (Node 20 (Leaf 4) (Leaf 6)) (Leaf 7)

Suppose we want to fold the tree with (+), summing up all the values contained within it. How do we go about it? foldr or foldl won't cut it here - they expect [a], not Tree a. We could try to write our own foldr:

foldTree :: (b -> a -> a) -> a -> Tree b -> a
foldTree _ z Empty = z
foldTree f z (Leaf x) = ??
foldTree f (Node x left right) = ??

There's a problem, however. Since we want to support an arbitrary folding result, we're not quite sure what to substitute for the ??s in the code above. In foldr, the folding function takes the accumulator and the next value in the sequence, but for trees it's not so simple. We may encounter a single leaf, and we may encounter several values to summarize; for the latter we have to invoke f on x as well as on the result of folding left and right. So it's not clear what the type of f should be - (b -> a -> a) doesn't appear to work [5].

A useful Haskell abstraction that can help us solve this problem is Monoid. A monoid is any data type that has an identity element (called mempty) and an associative binary operation called mappend. Monoids are, therefore, amenable to "summarization".

foldTree :: Monoid a => (b -> a) -> Tree b -> a
foldTree _ Empty = mempty
foldTree f (Leaf x) = f x
foldTree f (Node x left right) = (foldTree f left) <> f x <> (foldTree f right)

We no longer need to pass in an explicit zero element: since a is a Monoid, we have its mempty. Also, we can now apply a single (b -> a) function onto every element in the tree, and combine the results together into a summary using a's mappend (<> is the infix synonym of mappend).

The challenge of using foldTree is that we now actually need to use a unary function that returns a Monoid. Luckily, Haskell has some useful built-in monoids. For example, Data.Monoid.Sum wraps numbers into monoids under addition. We can find the sum of all elements in our tree t1 using foldTree and Sum:

> foldrTree Sum t1
Sum {getSum = 47}

Similarly, Data.Monoid.Product wraps numbers into monoids under multiplication:

> foldrTree Product t1
Product {getProduct = 33600}

Haskell provides a built-in typeclass named Data.Foldable that only requires us to implement a similar mapping function, and then takes care of defining many folding methods. Here's the instance for our tree:

instance Foldable Tree where
  foldMap f Empty = mempty
  foldMap f (Leaf x) = f x
  foldMap f (Node x left right) = foldMap f left <> f x <> foldMap f right

And we'll automatically have foldr, foldl and other folding methods available on Tree objects:

> Data.Foldable.foldr (+) 0 t1
47

Note that we can pass a regular binary (+) here; Data.Foldable employs a bit of magic to turn this into a properly monadic operation. We get many more useful methods on trees just from implementing foldMap:

> Data.Foldable.toList t1
[4,20,6,10,7]
> Data.Foldable.elem 6 t1
True

It's possible that for some special data structure these methods can be implemented more efficiently than by inference from foldMap, but nothing is stopping us from redefining specific methods in our Foldable instance. It's pretty cool, however, to see just how much functionality can be derived from having a single mapping method (and the Monoid guarantees) defined. See the documentation of Data.Foldable for more details.


[1]Note that I'm using Python 3 for all the code in this article; hence, Python 3's division semantics apply.
[2]Division has a problem with not having a natural "zero" element; therefore, it's more suitable for foldl1 and reduce, which are described later on.
[3]I'm prefixing most functions here with my since they have Haskell standard library builtin equivalents; while it's possible to avoid the name clashes with some import tricks, custom names are the least-effort approach, also for copy-pasting these code snippets into a REPL.
[4]I realize this is a very rudimentary explanation of Haskell laziness, but going deeper is really out of scope of this article. There are plenty of resources online to read about lazy vs. eager evaluation, if you're interested.
[5]We could try to apply f between the leaf value and z, but it's not clear in what order this should be done (what if f is sensitive to order?). Similarly for a Node, since there are no guarantees on the associativity of f, it's hard to predict what is the right way of applying it multiple times.

August 16, 2017 12:48 PM


Catalin George Festila

The DreamPie - interactive shell .

The DreamPie was designed to bring you a great interactive shell Python experience.
There are two ways to install the DreamPie:

You can read about installation and download here.
To run it just try the dreampie.exe with your python shell, I used with my python 2.7 version:
C:\DreamPie>dreampie.exe --hide-console-window c:\Python27\python.exe
Let's see one screenshot of this running command:

Also, I tested with Python 3.6.2 and works well.
The main window is divided into the history box and the code box.
The history box lets you view previous commands and their output.
The code box for write your code.
Some keys I used:


You can set your font , colors and many features.
I make the installation into C:\DreamPie folder , and comes with all these folders and files:
C:\DreamPie>tree
Folder PATH listing for volume free-tutorials
Volume serial number is 000000FF 0EB1:091D
C:.
├───data
│ ├───language-specs
│ ├───subp-py2
│ │ └───dreampielib
│ │ ├───common
│ │ └───subprocess
│ └───subp-py3
│ └───dreampielib
│ ├───common
│ └───subprocess
├───gtk-2.0
│ ├───cairo
│ ├───gio
│ ├───glib
│ ├───gobject
│ ├───gtk
│ └───runtime
│ ├───bin
│ ├───etc
│ │ ├───bash_completion.d
│ │ ├───fonts
│ │ ├───gtk-2.0
│ │ └───pango
│ ├───lib
│ │ ├───gdk-pixbuf-2.0
│ │ │ └───2.10.0
│ │ │ └───loaders
│ │ ├───glib-2.0
│ │ │ └───include
│ │ └───gtk-2.0
│ │ ├───2.10.0
│ │ │ └───engines
│ │ ├───include
│ │ └───modules
│ └───share
│ ├───aclocal
│ ├───dtds
│ ├───glib-2.0
│ │ ├───gdb
│ │ ├───gettext
│ │ │ └───po
│ │ └───schemas
│ ├───gtk-2.0
│ ├───gtksourceview-2.0
│ │ ├───language-specs
│ │ └───styles
│ ├───icon-naming-utils
│ ├───themes
│ │ ├───Default
│ │ │ └───gtk-2.0-key
│ │ ├───Emacs
│ │ │ └───gtk-2.0-key
│ │ ├───MS-Windows
│ │ │ └───gtk-2.0
│ │ └───Raleigh
│ │ └───gtk-2.0
│ └───xml
│ └───libglade
└───share
├───applications
├───man
│ └───man1
└───pixmaps

August 16, 2017 12:47 PM


PyCharm

Analyzing Data in Amazon Redshift with Pandas

Redshift is Amazon Web Services’ data warehousing solution. They’ve extended PostgreSQL to better suit large datasets used for analysis. When you hear about this kind of technology as a Python developer, it just makes sense to then unleash Pandas on it. So let’s have a look to see how we can analyze data in Redshift using a Pandas script!

Setting up Redshift

If you haven’t used Redshift before, you should be able to get the cluster up for free for 2 months. As long as you make sure that you don’t use more than 1 instance, and you use the smallest available instance.

To play around, let’s use Amazon’s example dataset, and to keep things very simple, let’s only load the ‘users’ table. Configuring AWS is a complex subject, and they’re a lot better at explaining how to do it than we are, so please complete the first four steps of the AWS tutorial for setting up an example Redshift environment. We’ll use PyCharm Professional Edition as the SQL client.

Connecting to Redshift

After spinning up Redshift, you can connect PyCharm Professional to it by heading over to the database tool window (View | Tool Windows | Database), then use the green ‘+’ button, and select Redshift as the  data source type. Then fill in the information for your instance:

Redshift Connection

Make sure that when you click the ‘test connection’ button you get a ‘connection successful’ notification. If you don’t, make sure that you’ve correctly configured your Redshift cluster’s VPC to allow connections from 0.0.0.0/0 on port 5439.

Now that we’ve connected PyCharm to the Redshift cluster, we can create the tables for Amazon’s example data. Copy the first code listing from here, and paste it into the SQL console that was opened in PyCharm when you connected to the database. Then execute it by pressing Ctrl + Enter, when PyCharm asks which query to execute, make sure to select the full listing. Afterward, you should see all the tables in the database tool window:

Database Tool Window

To load the sample data, go back to the query window, and use the Redshift ‘load’ command to load data from an Amazon S3 bucket into the database:

Redshift Query

The IAM role identifier should be the identifier for the IAM role you’ve created for your Redshift cluster in the second step in the Amazon tutorial. If everything goes right, you should have about 50,000 rows of data in your users table after the command completes.

Loading Redshift Data into a Pandas Dataframe

So let’s get started with the Python code! In our example we’ll use Pandas, Matplotlib, and Seaborn. The easiest way to get all of these installed is by using Anaconda, get the Python 3 version from their website. After installing, we need to choose Anaconda as our project interpreter:

Anaconda Root Env

If you can’t find Anaconda in the dropdown, you can click the settings “gear” button, and then select ‘Add Local’ and find your Anaconda installation on your disk. We’re using the root Anaconda environment without Conda, as we will depend on several scientific libraries which are complicated to correctly install in Conda environments.

Pandas relies on SQLAlchemy to load data from an SQL data source. So let’s use the PyCharm package manager to install sqlalchemy: use the green ‘+’ button next to the package list and find the package. To make SQLAlchemy work well with Redshift, we’ll need to install both the postgres driver, and the Redshift additions. For postgres, you can use the PyCharm package manager to install psycopg2. Then we need to install sqlalchemy-redshift to teach SQLAlchemy the specifics of working with a Redshift cluster. This package is unfortunately not available in the default Anaconda repository, so we’ll need to add a custom repository.

To add a custom repository click the ‘Manage Repositories’ button, and then use the green ‘+’ icon to add the ‘conda-forge’ channel. Afterwards, close the ‘Manage Repositories’ screen, and install sqlalchemy-redshift. Now that we’ve done that, we can start coding!

Conda Forge Channel

To show how it’s done, let’s analyze something simple in Amazon’s dataset, the users dataset holds fictional users, and then indicates for every user if they like certain types of entertainment. Let’s see if there’s any correlation between the types of entertainment.

As always, the full code is available on GitHub.

Let’s open a new Python file, and start our analysis. At first, we need to load our data. Redshift is accessed just like a regular PostgreSQL database, just with a slightly different connection string to use the redshift driver:

connstr = 'redshift+psycopg2://<username>:<password>@<your cluster>.redshift.amazonaws.com:5439/<database name>'

Also note that Redshift by default listens on port 5439, rather than Postgres’ 5432.

After we’ve connected we can use Pandas’ standard way to load data from an SQL database:

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine(connstr) 

with engine.connect() as conn, conn.begin():
   df = pd.read_sql("""
           select
             likesports as sports,
             liketheatre as theater,
             likeconcerts as concerts,
             likejazz as jazz,
             likeclassical as classical,
             likeopera as opera,
             likerock as rock,
             likevegas as vegas,
             likebroadway as broadway,
             likemusicals as musicals
           from users;""", conn)

The dataset holds users’ preferences as False, None, or True. Let’s interpret this as True being a ‘like’, None being ambivalent, and False being a dislike. To make a correlation possible, we should convert this into numeric values:

# Map dataframe to have 1 for 'True', 0 for null, and -1 for False
def bool_to_numeric(x):
   if x:
       return 1
   elif x is None:
       return 0
   else:
       return -1

df = df.applymap(bool_to_numeric)

And now we’re ready to calculate the correlation matrix, and present it. To present it we’ll use Seaborn’s heatmap. We’ll also create a mask to only show the bottom half of the correlation matrix (the top half mirrors the bottom).

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Calculate correlations
corr = df.corr()

mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True

sns.heatmap(corr,
           mask=mask,
           xticklabels=corr.columns.values,
           yticklabels=corr.columns.values)

plt.xticks(rotation=45)
plt.yticks(rotation=45)

plt.tight_layout()
plt.show()

After running this code, we can see that there are no correlations in the dataset:

Redshift Example Correlations

Which is strong evidence for Amazon’s sample dataset being a sample dataset. QED.

Fortunately, PyCharm also works great with real datasets in Redshift. Let us know in the comments what data you’re interested in analyzing!

August 16, 2017 11:45 AM

August 15, 2017


Stéphane Wirtel

PythonFOSDEM 2018

Because I want to be in advance this year for the organization of the PythonFOSDEM 2018, I have worked on the web site. The Call for Proposals will be announced once we have the “Go” from FOSDEM. In fact, for 2018, I do not know if we will have a room at FOSDEM, because during PythonFOSDEM 2017, we have received a sponsoring from Facebook and the organizers of FOSDEM were angry, because the sponsoring was for 1000 free beers ;-)

August 15, 2017 07:00 PM


PyCharm

Support a Great Partnership: PyCharm and Django Team up Again

Last June (2016) JetBrains PyCharm partnered with the Django Software Foundation to generate a big boost to Django fundraising. The campaign was a huge success. Together we raised a total of $50,000 for the Django Software Foundation!

This year we hope to repeat that success. During the two-week campaign, buy a new PyCharm Professional Edition individual license with a 30% discount code, and all the money raised will go to the DSF’s general fundraising and the Django Fellowship program.

Promotion details

pycharm-django-blog-postUp until Aug 28th, you can effectively donate to Django by purchasing a New Individual PyCharm Professional annual subscription at 30% off. It’s very simple:

1. When buying a new annual PyCharm subscription in our e-store, on the checkout page, сlick “Have a discount code?”.
2. Enter the following 30% discount promo code:
ISUPPORTDJANGO
Alternatively, just click this shortcut link to go to the e-store with the code automatically applied
3. Fill in the other required fields on the page and click the “Place order” button.

All of the income from this promotion code will go to the DSF fundraising campaign 2017 – not just the profits, but actually the entire sales amount including taxes, transaction fees – everything. The campaign will help the DSF to maintain the healthy state of the Django project and help them continue contributing to their different outreach and diversity programs.

Read more details on the special promotion page.

frank-wiles

“Django has grown to be a world-class web framework, and coupled with PyCharm’s Django support, we can give tremendous developer productivity,” says Frank Wiles, DSF President. “Last year JetBrains was a great partner for us in support of raising money for the Django Software Foundation, on behalf of the community, I would like to extend our deepest thanks for their generous help. Together we hope to make this a yearly event!”

If you have any questions, get in touch with Django at fundraising@djangoproject.com or JetBrains at sales@jetbrains.com.

August 15, 2017 05:49 PM


Django Weblog

Support a Great Partnership: PyCharm and Django Team up Again

Last June (2016) JetBrains PyCharm partnered with the Django Software Foundation to generate a big boost to Django fundraising. The campaign was a huge success. Together we raised a total of $50,000 for the Django Software Foundation!

This year we hope to repeat that success. During the two-week campaign, buy a new PyCharm Professional Edition individual license with a 30% discount code, and all the money raised will go to the DSF’s general fundraising and the Django Fellowship program.

Promotion details

Up until Aug 28th, you can effectively donate to Django by purchasing a New Individual PyCharm Professional annual subscription at 30% off. It’s very simple:

  1. When buying a new annual PyCharm subscription in our e-store, on the checkout page, сlick “Have a discount code?”.
  2. Enter the following 30% discount promo code:
    IDONATETODJANGO

Alternatively, just click this shortcut link to go to the e-store with the code automatically applied

Fill in the other required fields on the page and click the “Place order” button.

All of the income from this promotion code will go to the DSF fundraising campaign 2017 – not just the profits, but actually the entire sales amount including taxes, transaction fees – everything. The campaign will help the DSF to maintain the healthy state of the Django project and help them continue contributing to their different outreach and diversity programs.

Read more details on the special promotion page.

“Django has grown to be a world-class web framework, and coupled with PyCharm’s Django support, we can give tremendous developer productivity,” says Frank Wiles, DSF President. “Last year JetBrains was a great partner for us in support of raising money for the Django Software Foundation, on behalf of the community, I would like to extend our deepest thanks for their generous help. Together we hope to make this a yearly event!”

If you have any questions, get in touch with Django at fundraising@djangoproject.com or JetBrains at sales@jetbrains.com.

August 15, 2017 02:38 PM


Fabio Zadrozny

PyDev 5.9.2 released (Debugger improvements, isort, certificate)

PyDev 5.9.2 is now available for download.

This version now integrates the performance improvements which were done in PyDev.Debugger for 3.6 (which use the new hook available by Python and changes bytecode to add calls to the debugger so that there's less overhead during the debugging -- note that this only really takes place if breakpoints are added before a given code is loaded, adding or removing breakpoints afterwards falls back to the previous approach of tracing).

Another nice feature in this release is that isort (https://github.com/timothycrosley/isort) can be used as the default engine for sorting imports (needs to be configured in preferences > PyDev > Editor > Code Style > Imports -- note that at that same preferences dialog you may save the settings to a project, not only globally).

There were also a number of bug-fixes... in particular one that prevented text searches from working if the user had another plugin which also used Lucene in a different version was really nasty... http://www.pydev.org has more details on the changes.

This is also the first release which is signed with a proper certificate (provided by Comodo) -- so, it's nice that Eclipse won't complain that the plugin is not signed when it's being installed, although I discovered that it isn't as useful as I thought... it does work as intended for Eclipse plugins, but for Windows, even signing the LiClipse installer will show a dialog for users (there's a more expensive version with extended validation which could be used, but I didn't go for that one) and on Mac OS I haven't even tried to sign as it seems Comodo certificates are worthless there (the only choice is having a development subscription from Apple and using a certificate Apple gives you... the verification they do seems compatible with what Comodo gives, which uses a DUNS number, so, it's apparently just a point of them wanting more $$$/control, not really being more secure), so, currently Mac users will still use unsigned binaries (the sha256 is provided for users which want to actually check that what they download is what's being distributed).

August 15, 2017 11:34 AM


Martin Fitzpatrick

KropBot: Multiplayer Internet-controlled robot

KropBot is a little multiplayer robot you can control over the internet. Co-operate with random internet strangers to drive around in circles and into walls.

If it is online, you can drive the KropBot yourself!

KropBot is dead. 15 minutes after posting to Planet Python, Kropbot was mercilessly driven down a flight of stairs. He is no more, he is kaput. He is an ex-robot.

Requirements

If you already have a working 2-motor robot platform you can skip straight to the code. The code shown below will work with any Raspberry Pi with WiFi (Zero W recommended) and MotorHAT.

I also needed —

Build

Chassis

The chassis I used came pre-constructed, with motors in place and seems to be the deconstructed base of a toy tank. The sales photo slightly oversells its capabilities.

This is unfortunately not what it looks like

There was no AA battery holder included, just a space behind a flap labelled 4.8V. The space measured the size of 4xAA batteries (giving 6V total) but the 4xAA battery holder I ordered didn’t fit. However, a 6xAA battery pack I had could be cut down to size by lopping off 2 holders and rewiring. Save yourself the hassle and get a chassis with a battery holder.

Battery compartment

The pack is still a bit too deep, but the door can be closed with a screwdriver to wedge it shut.

Battery compartment closed

An ON/OFF switch is provided in the bottom of the case which I wanted to be able to use to switch off the motor power (to save battery life, when the Pi wasn’t running). The power leads were fed through to the upper side, but were too short to reach the switch, so these were first extended before being soldered to the switch.

Internal wiring

MotorHAT

The MotorHAT (here using a cheap knock-off) is a extension board for controlling 4 motors (or stepper motors) from a Pi, with speed and direction control. This is a shortcut, but you could also use L293D motor drivers together with PWM on the GPIO pins for speed control.

Once wired into the power supply (AA batteries) the MotorHAT power LED should light up.

The MotorHAT

The motor supply is wired in separately to the HAT, and the board keeps this isolated from the Pi supply/GPIO. The + lead is wired in through the switch as already described.

Next the motors are wired into the terminals, using the outer terminals for each motor. The left motor goes on M1 and the right on M2. Getting the wires the right way around (so forward=forward) is a case of trial an error, try one then reverse it if it’s wrong.

The MotorHAT

Once that’s wired up, you can pop the MotorHAT on top of the Pi. Make sure it goes the right way around — the MotorHAT should be over the Pi board, on the top side.

Pi Camera

The Pi Camera unit attaches into the camera port on the Pi using the cable. If using a Pi Zero you need a specific connector for the camera which is narrower at the Pi end. Make sure the plastic widget is on the port to hold the cable, line the cable up metal up and push it in.

Camera cable

The camera assembly isn’t fancy, it’s just tacked to the edge of the base with some card and tape.

Camera

Once that’s one you can power up the Pi with a powerbank.

Pi with attached MotorHAT

The code

The robot control code is split into 3 parts —

  1. The robot control code, that handles the inputs, moves the robot and streams the camera
  2. The server which receives inputs from the (multiple) clients and forwards them to the robot in batches, and receives the single camera images from the robot and broadcasts them to the clients.
  3. The client code which sends user inputs to the server, and renders the images being sent in return.

The server isn’t strictly necessary in this setup — the Pi itself could happily run a webserver to serve the client interface, and then directly interface with clients via websockets. However, that would require the robot to be accessible on the internet and streaming the camera images to multiple clients would add quite a bit of overhead. Since we’re hoping to support a largish number (>25) of simultaneous clients it’s preferable to offload that work somewhere other than the Pi. The downside is that this two-hop approach adds some control/refresh delay and makes things slightly less reliable (more later).

The full source is available on Github.

The client app.js

The browser part, which provides the the user-interface for control and a display for streaming images from the robot, was implemented in AngularJS to keep things simple. Just the controller code is shown below.

KropBot Client

var robotApp = angular.module('robotApp', []);
robotApp.controller('RobotController', function ($scope, $http, $interval, socket) {

    var uuidv4 = function () {
        return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c) {
            var r = Math.random() * 16 | 0, v = c == 'x' ? r : (r & 0x3 | 0x8);
            return v.toString(16);
        });
    }

    $scope.directions = [4,5,6,7,8,1,2,3];
    $scope.uuid = uuidv4();
    $scope.data = {
        selected: null,
        direction: null,
        magnitude: 0,
        n_controllers: 0,
        total_counts: {}
    }
    $scope.min = window.Math.min;

    socket.emit('client_ready');

    $scope.set_direction = function(i) {
        $scope.data.selected = i;
        $scope.send_instruction()
    }

    $scope.send_instruction = function() {
        socket.emit('instruction', {
            user: $scope.uuid,
            direction: $scope.data.selected,
        })
    }

    // Instruction timeout is 3 second, ping to ensure we stay live
    setInterval($scope.send_instruction, 1500);

The main data receive block accepts all data from the server and assigns it onto the $scope to update the interface. Image data from the camera is sent as raw bytes, so to get it into an img element we need to convert it to base64encoded URL. This would be more efficient to do on the robot/server (avoid every client having to perform this operation) but base64 encoding increases the size of the data transmitted by about 1/3.

        // Receive updated signal via socket and apply data
        socket.on('updated_status', function (data) {
            $scope.data.direction = data.direction;
            $scope.data.magnitude = data.magnitude;
            $scope.data.n_controllers = data.n_controllers
            $scope.data.total_counts = data.total_counts;
          });

        // Receive updated signal via socket and apply data
        socket.on('updated_image', function (data) {
            blob = new Blob([data], {type: "image/jpeg"});
            $scope.data.imageUrl = window.URL.createObjectURL(blob);
          });

 });

Finally, we define a set of wrappers to integrate sockets in our AngularJS application. This ensures that changes to the scope that are triggered by websocket events are detected — and the view updated.

robotApp.factory('socket', function ($rootScope) {
  var socket = io.connect();
  return {
    on: function (eventName, callback) {
      socket.on(eventName, function () {
        var args = arguments;
        $rootScope.$apply(function () {
          callback.apply(socket, args);
        });
      });
    },
    emit: function (eventName, data, callback) {
      socket.emit(eventName, data, function () {
        var args = arguments;
        $rootScope.$apply(function () {
          if (callback) {
            callback.apply(socket, args);
          }
        });
      })
    }
  };
});

The server app.py

Note the (optional) use of ROBOT_WS_SECRET to change the endpoints used for communicating with the robot. This is a simple way to prevent a client connecting, pretending it’s the robot, and broadcasting naughty images to everyone. If you set this value, just make sure you set it to the same thing on both the robot and server (via the environment variable) or they won’t be able to talk to one another.

import os
import time

from flask import Flask
from flask_socketio import SocketIO, join_room

app = Flask(__name__)
app.config.from_object(os.environ.get('APP_SETTINGS', 'config.Config'))
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
app.secret_key = app.config['SECRET_KEY']
socketio = SocketIO(app)

# Use a secret WS endpoint for robot.
robot_ws_secret = app.config['ROBOT_WS_SECRET']

# Buffer incoming instructions and emit direct to the robot.
# on each image cycle.
instruction_buffer = {}
latest_robot_state = {}

INSTRUCTION_DURATION = 3


@app.route('/')
def index():
    """
    Return template for survey view, data (form) loaded by JSON
    :return: raw index.html response
    """
    return app.send_static_file('index.html')

To stop dead clients from continuing to control the robot, we need to expire instructions after INSTRUCTION_DURATION seconds have elapsed.

def clear_expired_instructions():
    """
    Remove all expired instructions from the buffer.

    Instructions are expired if their age > INSTRUCTION_DURATION. This needs to be low
    enough that the robot stops performing a behaviour when a client leaves, but
    high enough that an active client's instructions are not cleared due to lag.
    """
    global instruction_buffer
    threshold = time.time() - INSTRUCTION_DURATION
    instruction_buffer = {k: v for k, v in instruction_buffer.items() if v['timestamp'] > threshold}

All communication between the server, the clients and robot is handled through websockets. We assigned clients to a specific socketIO “room” so we can communicate with them in bulk, without also sending the same data to the robot.

@socketio.on('client_ready')
def client_ready_join_room(message):
    """
    Receive the ready instruction from (browser) clients and assign them to the client room.
    """
    join_room('clients')

User instructions are received regularly from clients (timed-ping) with a unique user ID being used to ensure only one instruction is stored per client. Instructions are timestamped as they arrive, so they can be expired later.

@socketio.on('instruction')
def user_instruction(message):
    """
    Receive and buffer direction instruction from client.
    :return:
    """

    # Perform validation on inputs, direction must be in range 1-9 or None. Anything else
    # is interpreted as None (=STOP) from that client.
    message['direction'] = message['direction'] if message['direction'] in range(1, 9) else None

    instruction_buffer[message['user']] = {
        'direction': message['direction'],
        'timestamp': int(time.time())
    }

The robot camera stream and instruction/status loop run separately, and connect to different sockets. Updates to status are forwarded onto the clients and responded to with the latest instruction buffer. The camera image is forwarded onwards with no response.

@socketio.on('robot_update_' + robot_ws_secret)
def robot_update(message):
    """
    Receive the robot's current status message (dict) and store for future
    forwarding to clients. Respond with the current instruction buffer directions.
    :param message: dict of robot status
    :return: list of directions (all clients)
    """
    # Forward latest state to clients.
    socketio.emit('updated_status', message, json=True, room='clients')
    # Clear expired instructions and return the remainder to the robot.
    clear_expired_instructions()
    return [v['direction'] for v in instruction_buffer.values()]


@socketio.on('robot_image_' + robot_ws_secret)
def robot_image(data):
    """
    Receive latest image data and broadcast to clients
    :param data:
    """
    # Forward latest camera image
    socketio.emit('updated_image', data, room='clients')


if __name__ == '__main__':
    socketio.run(app)

The robot robot.py

The robot is implemented in Python 3 and runs locally on the Pi (see later for instructions to run automatically at startup).

import atexit
from collections import Counter
from concurrent import futures
from io import BytesIO
import math, cmath
import os
import time

from Adafruit_MotorHAT import Adafruit_MotorHAT, Adafruit_DCMotor
from picamera import PiCamera
from socketIO_client import SocketIO

The following constants can be used to configure the behaviour of the robot, but the values below were found to be produce a stable and reasonably responsive bot.

SPEED_MULTIPLIER = 200
UPDATES_PER_SECOND = 5
CAMERA_QUALITY = 10
CAMERA_FPS = 5

# A store of incoming instructions from clients, stored as list of client directions
instructions = []

# Use a secret WS endpoint for robot.
robot_ws_secret = os.getenv('ROBOT_WS_SECRET', '')

The robot uses 8 direction values for the compass points and intermediate directions. The exact behaviour of these directions are defined in the DIRECTIONS dictionary, as a tuple of direction and magnitudes for the two motors.

# Conversion from numeric inputs to motor instructions + multipliers. The multipliers
# are adjusted for each direction, e.g. forward is full-speed, turn is half.
DIRECTIONS = {
    1: ((Adafruit_MotorHAT.FORWARD, 0.75), (Adafruit_MotorHAT.FORWARD, 0.5)),
    2: ((Adafruit_MotorHAT.FORWARD, 0.5), (Adafruit_MotorHAT.BACKWARD, 0.5)),
    3: ((Adafruit_MotorHAT.BACKWARD, 0.75), (Adafruit_MotorHAT.BACKWARD, 0.5)),
    4: ((Adafruit_MotorHAT.BACKWARD, 1), (Adafruit_MotorHAT.BACKWARD, 1)),
    5: ((Adafruit_MotorHAT.BACKWARD, 0.5), (Adafruit_MotorHAT.BACKWARD, 0.75)),
    6: ((Adafruit_MotorHAT.BACKWARD, 0.5), (Adafruit_MotorHAT.FORWARD, 0.5)),
    7: ((Adafruit_MotorHAT.FORWARD, 0.5), (Adafruit_MotorHAT.FORWARD, 0.75)),
    8: ((Adafruit_MotorHAT.FORWARD, 1.5), (Adafruit_MotorHAT.FORWARD, 1.5)),
}

# Initialize motors, and define left and right controllers.
motor_hat = Adafruit_MotorHAT(addr=0x6f)
left_motor = motor_hat.getMotor(1)
right_motor = motor_hat.getMotor(2)


def turnOffMotors():
    """
    Shutdown motors and unset handlers.
    Called on exit to ensure the robot is stopped.
    """
    left_motor.run(Adafruit_MotorHAT.RELEASE)
    right_motor.run(Adafruit_MotorHAT.RELEASE)

The average of all client’s input direction angles are combined using complex math to convert the angles into vectors, sum them and calculate back the angle of the resulting single point. You could also do this using the sum of sines and cosines.

def average_radians(list_of_radians):
    """
    Return average of a list of angles, in radians, and it's amplitude.

    We calculate a set of vectors for each angle, using a fixed distance.
    Add up the sum of the x, y of the resulting vectors.
    Work back to an angle + get a magnitude.

    :param list_of_radians:
    :return:
    """
    vectors = [cmath.rect(1, angle) if angle is not None else cmath.rect(0, 0)
               # length 1 for each vector; or 0,0 for null (stopped)
               for angle in list_of_radians]

    vector_sum = sum(vectors)
    return cmath.phase(vector_sum), abs(vector_sum)


def to_radians(d):
    """
    Convert 7-degrees values to radians.
    :param d:
    :return: direction in radians
    """
    return d * math.pi / 4 if d is not None else None


def to_degree7(r):
    """
    Convert radians to 'degrees' with a 0-7 scale.
    :param r:
    :return: direction in 7-value degrees
    """
    return round(r * 4 / math.pi)


def map1to8(v):
    """
    Limit v to the range 1-8 or None, with 0 being converted to 8 (straight ahead).

    This is necessary because the back-calculation to degree7 will negative values
    yet the input to calculate_average_instruction must use 1-8 to weight forward
    instructions correctly.
    :param v:
    :return: v, in the range 1-8 or None
    """
    if v is None or v > 0:
        return v
    return v + 8  # if 0, return 8

We iterate over all the client instructions, calculate an average angle and magnitude, and count totals for each direction (to be sent back to the clients).

def calculate_average_instruction():
    """
    Return a dictionary of counts for each direction option in the current
    instructions and the direction with the maximum count.

    Directions are stored in numeric range 0-7, we first convert these imaginary
    degrees to radians, then calculate the average radians by adding vectors.
    Once we have that value in radians we can convert back to our own scale
    which the robot understands. The amplitude value gives us a speed.

    0 = Forward
    7/1 = Forward left/right (slight)
    6/2 = Turn left right (stationary)
    5/3 = Backwards left/right (slight)
    4 = Backwards

    :return: dict total_counts, direction
    """

    # If instructions remaining, calculate the average.
    if instructions:
        directions_v, direction_rads = zip(*[(d, to_radians(d)) for d in instructions])
        total_counts = Counter([map1to8(v) for v in directions_v])

        rad, magnitude = average_radians(direction_rads)

        if magnitude < 0.05:
            magnitude = 0
            direction = None

        return {
            'total_counts': total_counts,
            'direction': map1to8(to_degree7(rad)),
            'magnitude': magnitude
        }

    else:
        return {
            'total_counts': {},
            'direction': None,
            'magnitude': 0
        }

The average control data is converted to motor instructions using the DIRECTIONS dictionary defined earlier. Because of the way that multiple client instructions are combined and then multiplied motor speeds can end up > 255, so we additionally need to cap these.

def control_robot(control):
    """
    Takes current robot control instructions and apply to the motors.
    If direction is None, all-stop, otherwise calculates a speed
    for each motor using a combination of DIRECTIONS, magnitude
    and SPEED_MULTIPLIER, capped at 255.
    :param control:
    """
    if control['direction'] is None:
        # All stop.
        left_motor.setSpeed(0)
        right_motor.setSpeed(0)
        return

    direction = int(control['direction'])
    left, right = DIRECTIONS[direction]
    magnitude = control['magnitude']

    left_motor.run(left[0])
    left_speed = int(left[1] * magnitude * SPEED_MULTIPLIER)
    left_speed = min(left_speed, 255)
    left_motor.setSpeed(left_speed)

    right_motor.run(right[0])
    right_speed = int(right[1] * magnitude * SPEED_MULTIPLIER)
    right_speed = min(right_speed, 255)
    right_motor.setSpeed(right_speed)

We collect incoming messages and store them in the instruction list. This is emptied out on each loop before the callback should hit this function — we delete it in the main loop in case the callback fails.

def on_new_instruction(message):
    """
    Handler for incoming instructions from clients. Instructions are received, combined
    and expired on the server, so only active instructions (on per client) are received
    here.
    :param message: dict of all current instructions from all clients.
    :return:
    """
    instructions.extend(message)

Streaming the images from robots is heavy work, so we spin it off into it’s own process — and using a separate websocket to the server. The worker intializes the camera, opens a socket to the server and then iterates over the Pi camera capture_continuous iterator — a never-ending generator of images. Image data is emitted in raw bytes.

def streaming_worker():
    """
    A self-container worker for streaming the Pi camera over websockets to the server
    as JPEG images. Initializes the camera, opens the websocket then enters a continuous
    capture loop, with each snap transmitted.
    :return:
    """
    camera = PiCamera()
    camera.resolution = (200, 300)
    camera.framerate = CAMERA_FPS

    with BytesIO() as stream, SocketIO('https://kropbot.herokuapp.com', 443) as socketIO:
        # capture_continuous is an endless iterator. Using video port + low quality for speed.
        for _ in camera.capture_continuous(stream, format='jpeg', use_video_port=True, quality=CAMERA_QUALITY):
            stream.truncate()
            stream.seek(0)
            data = stream.read()
            socketIO.emit('robot_image_' + robot_ws_secret, bytearray(data))
            stream.seek(0)

The main loop controls the sending of robot status updates to the server, and receiving of client instructions in return. The loop is throttled to UPDATES_PER_SECOND by setting wait locks at the end of each iteration. This is required to stop flooding the server with useless packets of data.

if __name__ == "__main__":
    # Register our function to disable motors when we shutdown.
    atexit.register(turnOffMotors)
    with futures.ProcessPoolExecutor() as executor:
        # Execute our camera streamer 'streaming_worker' in a separate process.
        # This runs continuously until exit.
        future = executor.submit(streaming_worker)

        with SocketIO('https://kropbot.herokuapp.com', 443) as socketIO:
            while True:
                current_time = time.time()
                lock_time = current_time + 1.0 / UPDATES_PER_SECOND
                # Calculate current average instruction based on inputs,
                # then perform the action.
                instruction = calculate_average_instruction()
                control_robot(instruction)
                instruction['n_controllers'] = len(instructions)
                # on_new_instruction is a callback to handle the server's response.
                socketIO.emit('robot_update_' + robot_ws_secret, instruction, on_new_instruction)
                # Empty all current instructions before accepting any new ones,
                # ensuring that if we lose contact with the server we stop.
                del instructions[:]
                socketIO.wait_for_callbacks(5)

                # Throttle the updates sent out to UPDATES_PER_SECOND (very roughly).
                time.sleep(max(0, lock_time - time.time()))

Starting up

Server

The code repository contains the files needed to get this set up on a Heroku host. First we need to create the app on Heroku, and add this repository as a remote for our git repo.

heroku create <app-name>
heroku heroku git:remote -a <app-name>

The app name will be the subdomain your app is hosted under https://<app-name>.herokuapp.com. To push the code up and start the application you can now use:

git push heroku remote

Note that if you are going to set a ROBOT_WS_SECRET on your robot (optional, but recommended) you will need to set this on Heroku too. Random letters and numbers are fine.

heroku config:set ROBOT_WS_SECRET =<your-ws-secret>

Robot

Copy the robot code over to your Pi, e.g. using scp

scp robot.py pi@raspberrypi.local:/home/pi

You can then SSH in with ssh pi@raspberrypi.local. First install the PiCamera and then MotorHAT libraries from Adafruit (these work for the non-official boards too).

git clone https://github.com/adafruit/Adafruit-Motor-HAT-Python Library.git
cd Adafruit-Motor-HAT-Python-Library
sudo apt-get install python3-dev python3-picamera
sudo python3 setup.py install

You can then install the remaining Python dependencies with pip, e.g.

sudo pip3 install git+https://github.com/wwqgtxx/socketIO-client.git

We are using a non-standard version of SocketIO_client in order to get support for binary data streaming. Otherwise we would need to base64encode it on the robot, increasing it’s size and adding load.

Finally, we need to enable the camera on the Pi. To do this open up the Raspberry Pi configuration manager and enable the camera interface.

raspi-config

Once the camera is enabled (you may need to restart), you can start the robot with:

python3 robot.py

The simplest way to get the robot starting on each boot is to use cron. Edit your cron tab using crontab -e and add a line line the following, replacing your ROBOT_WS_SECRET value:

@reboot ROBOT_WS_SECRET=<your_ws_value> python3 /home/pi/robot.py

Save the file and exit back to the command line. If you run sudo reboot your Pi will restart, and the robot controller will start up automatically. Open your browser to your hosted Heroku app and as soon as the robot is live the camera stream should begin.

Notes

The configuration of the robot (fps/instructions per second) has been chosen to give decent responsiveness while not overloading the Heroku server. If you’re running on a beefier host you can probably increase these values a bit.

Streaming JPEG images is very inefficient (since each frame is encoded independently, a static stream still uses data). Using an actual video format e.g. h264 to stream would allow a huge improvement in quality with the same data rate. However, this does complicate the client (and robot) a bit, since we need to be able to send the initial stream spec data to each new client + need a raw h264 stream decoder on the client. Something for a rainy day.

RIP Kropbot

On 15th August 2017 at approximately 13.05 EST KropBot was driven down a flight of stairs by a kind internet stranger. He is no more.

This unfortunately is what it looks like

August 15, 2017 09:00 AM


Talk Python to Me

#125 Django REST framework and a new API star is born

APIs were once the new and enabling thing in technology. Today they are table-stakes. And getting them right is important. Today we'll talk about one of the most popular and mature API frameworks in Django REST Framework. You'll meet the creator, Tom Christie and talk about the framework, API design, and even his successful take on funding open source projects. <br/> <br/> But Tom is not done here. He's also creating the next generation API framework that fully embraces Python 3's features called API Star.<br/> <br/> Links from the show:<br/> <br/> <div style="font-size: .85em;"><b>Django REST framework</b>: <a href="http://www.django-rest-framework.org/" target="_blank">django-rest-framework.org</a><br/> <b>API Star</b>: <a href="https://github.com/tomchristie/apistar" target="_blank">github.com/tomchristie/apistar</a><br/> <b>Tom on Twitter</b>: <a href="https://twitter.com/_tomchristie" target="_blank">@_tomchristie</a><br/></div>

August 15, 2017 08:00 AM


Daniel Bader

Unpacking Nested Data Structures in Python

Unpacking Nested Data Structures in Python

A tutorial on Python’s advanced data unpacking features: How to unpack data with the “=” operator and for-loops.

Python Nested Data Structures Unpacking

Have you ever seen Python’s enumerate function being used like this?

for (i, value) in enumerate(values):
   ...

In Python, you can unpack nested data structures in sophisticated ways, but the syntax might seem complicated: Why does the for statement have two variables in this example, and why are they written inside parentheses?

This article answers those questions and many more. I wrote it in two parts:

Ready? Let’s start with a quick primer on the “BNF” syntax notation used in the Python language specification.

BNF Notation – A Primer for Pythonistas

This section is a bit technical, but it will help you understand the examples to come. The Python 2.7 Language Reference defines all the rules for the assignment statement using a modified form of Backus Naur notation.

The Language Reference explains how to read BNF notation. In short:

Here is the complete grammar for the assignment statement in Python 2.7. It looks a little complicated because Python allows many different forms of assignment:

An assignment statement consists of

  • one or more (target_list "=") groups
  • followed by either an expression_list or a yield_expression
assignment_stmt ::= (target_list "=")+ (expression_list | yield_expression)

A target list consists of

  • a target
  • followed by zero or more ("," target) groups
  • followed by an optional trailing comma
target_list ::= target ("," target)* [","]

Finally, a target consists of any of the following

  • a variable name
  • a nested target list enclosed in ( ) or [ ]
  • a class or instance attribute
  • a subscripted list or dictionary
  • a list slice
target ::= identifier
           | "(" target_list ")"
           | "[" [target_list] "]"
           | attributeref
           | subscription
           | slicing

As you’ll see, this syntax allows you to take some clever shortcuts in your code. Let’s take a look at them now:

#1 – Unpacking and the “=” Assignment Operator

First, you’ll see how Python’s “=” assignment operator iterates over complex data structures. You’ll learn about the syntax of multiple assignments, recursive variable unpacking, and starred targets.

Multiple Assignments in Python:

Multiple assignment is a shorthand way of assigning the same value to many variables. An assignment statement usually assigns one value to one variable:

x = 0
y = 0
z = 0

But in Python you can combine these three assignments into one expression:

x = y = z = 0

Recursive Variable Unpacking:

I’m sure you’ve written [ ] and ( ) on the right side of an assignment statement to pack values into a data structure. But did you know that you can literally flip the script by writing [ ] and ( ) on the left side?

Here’s an example:

[target, target, target, ...] =
or
(target, target, target, ...) =

Remember, the grammar rules allow [ ] and ( ) characters as part of a target:

target ::= identifier
           | "(" target_list ")"
           | "[" [target_list] "]"
           | attributeref
           | subscription
           | slicing

Packing and unpacking are symmetrical and they can be nested to any level. Nested objects are unpacked recursively by iterating over the nested objects and assigning their values to the nested targets.

Here’s what this looks like in action:

(a, b) = (1, 2)
# a == 1
# b == 2

(a, b) = ([1, 2], [3, 4])
# a == [1, 2]
# b == [3, 4]

(a, [b, c]) = (1, [2, 3])
# a == 1
# b == 2
# c == 3

Unpacking in Python is powerful and works with any iterable object. You can unpack:

Test Your Knowledge: Unpacking

What are the values of a, x, y, and z in the example below?

a = (x, y, z) = 1, 2, 3

Hint: this expression uses both multiple assignment and unpacking.

Starred Targets (Python 3.x Only):

In Python 2.x the number of targets and values must match. This code will produce an error:

x, y, z = 1, 2, 3, 4   # Too many values

Python 3.x introduced starred variables. Python first assigns values to the unstarred targets. After that, it forms a list of any remaining values and assigns it to the starred variable. This code does not produce an error:

x, *y, z = 1, 2, 3, 4
# y == [2,3]

Test Your Knowledge: Starred Variables

Is there any difference between the variables b and *b in these two statements? If so, what is it?

(a, b, c) = 1, 2, 3
(a, *b, c) = 1, 2, 3

#2 – Unpacking and for-loops

Now that you know all about target list assignment, it’s time to look at unpacking used in conjunction with for-loops.

In this section you’ll see how the for-statement unpacks data using the same rules as the = operator. Again, we’ll go over the syntax rules first and then we’ll look at a few hands-on examples.

Let’s examine the syntax of the for statement in Python:

for_stmt ::= "for" target_list "in" expression_list ":" suite
             ["else" ":" suite]

Do the symbols target_list and expression_list look familiar? You saw them earlier in the syntax of the assignment statement.

This has massive implications:

Everything you’ve just learned about assignments and nested targets also applies to for loops!

Standard Rules for Assignments:

Let’s take another look at the standard rules for assignments in Python. The Python Language Reference says:

The for statement is used to iterate over the elements of a sequence (such as a string, tuple or list) or other iterable objects … Each item, in turn, is assigned to the target list using the standard rules for assignments.

You already know the standard rules for assignments. You learned them earlier when we talked about the = operator. They are:

In the introduction, I promised I would explain this code:

for (i,value) in enumerate(values):
   ...

Now you know enough to figure it out yourself:

Examples:

I’ll finish by showing you a few more examples that use Python’s unpacking features with for-loops. Here’s some test data we’ll use in this section:

# Test data:
negative_numbers = (-1, -2, -3, -4, -5)
positive_numbers = (1, 2, 3, 4, 5)

The built-in zip function returns pairs of numbers:

>>> list(zip(negative_numbers, positive_numbers))
[(-1, 1), (-2, 2), (-3, 3), (-4, 4), (-5, 5)]

I can loop over the pairs:

for z in zip(negative_numbers, positive_numbers):
    print(z)

Which produces this output:

(-1, 1)
(-2, 2)
(-3, 3)
(-4, 4)
(-5, 5)

I can also unpack the pairs if I wish:

>>> for (neg, pos) in zip(negative_numbers, positive_numbers):
...     print(neg, pos)

-1 1
-2 2
-3 3
-4 4
-5 5

What about starred variables? This example finds a string’s first and last character. The underscore character is often used in Python when we need a dummy placeholder variable:

>>> animals = [
...    'bird',
...    'fish',
...    'elephant',
... ]

>>> for (first_char, *_, last_char) in animals:
...    print(first_char, last_char)

b d
f h
e t

Unpacking Nested Data Structures – Conclusion

In Python, you can unpack nested data structures in sophisticated ways, but the syntax might seem complicated. I hope that with this tutorial I’ve given you a clearer picture of how it all works. Here’s a quick recap of what we covered:

It pays off to go back to the basics and to read the language reference closely—you might find some hidden gems there!

August 15, 2017 12:00 AM

August 14, 2017


Continuum Analytics News

Five Organizations Successfully Fueling Innovation with Data Science

Tuesday, August 15, 2017
Christine Doig
Sr. Data Scientist, Product Manager

Data science innovation requires availability, transparency and interoperability. But what does that mean in practice? At Anaconda, it means providing data scientists with open source tools that facilitate collaboration; moving beyond analytics to intelligence. Open source projects are the foundation of modern data science and are popping up across industries, making it more accessible, more interactive and more effective. So, who’s leading the open source charge in the data science community? Here are five organizations to keep your eye on:

1. TaxBrain. TaxBrain is a platform that enables policy makers and the public to simulate and study the effects of tax policy reforms using open source economic models. Using the open source platform, anyone can plug elements of the administration’s proposed tax policy to get an idea of how it would perform in the real world.

 

2. Recursion Pharmaceuticals. Recursion is a pharmaceutical company dedicated to finding the remedies for rare genetic diseases. Its drug discovery assay is built on an open source software platform, combining biological science with machine learning techniques to visualize cell data and test drugs efficiently. This approach shortens research and development process, reducing time to market for remedies to these rare genetic diseases. Their goal is to treat 100 diseases by 2026 using this method.

3. The U.S. Government. Under the previous administration, the U.S. government launched Data.gov, an open data initiative that offers more than 197K datasets for public use. This database exists, in part, thanks to the former U.S. chief data scientist, DJ Patil. He helped drive the government’s data science projects forward, at the city, state and federal levels. Recently, concerns have been raised over the the Data.gov portal, as certain information has started to disappear. Data scientists are keeping a sharp eye on the portal to ensure that these resources are updated and preserved for future innovative projects.

4. Comcast. Telecom and broadcast giant, Comcast, run their projects on open source platforms to drive data science innovation in the industry. 

For example, earlier this month, Comcast’s advertising branch announced they were creating a Blockchain Insights Platform to make the planning, targeting, execution and measurement of video ads more efficient. This data-driven, secure approach would be a game changer for the advertising industry, which eagerly awaits its launch in 2018.

5. DARPA. The Defense Advanced Research Projects Agency (DARPA) is behind the Memex project, a program dedicated to fighting human trafficking, which is a top mission for the defense department. DARPA estimates that in two years, traffickers spent $250 million posting the temporary advertisements that fuel the human trafficking trade. Using an open source platform, Memex is able to index and cross reference interactive and social media, text, images and video across the web. This allows them to find the patterns in web data that indicate human trafficking. Memex’s data science approach is already credited in generating at least 20 active cases and nine open indictments. 

These are just some of the examples of open source-fueled data science turning industries on their head, bringing important data to the public and generally making the world a better place. What will be the next open source project to put data science in the headlines? Let us know what you think in the comments below!

August 14, 2017 06:12 PM


PyPy Development

Let's remove the Global Interpreter Lock

Hello everyone

The Python community has been discussing removing the Global Interpreter Lock for a long time. There have been various attempts at removing it: Jython or IronPython successfully removed it with the help of the underlying platform, and some have yet to bear fruit, like gilectomy. Since our February sprint in Leysin, we have experimented with the topic of GIL removal in the PyPy project. We believe that the work done in IronPython or Jython can be reproduced with only a bit more effort in PyPy. Compared to that, removing the GIL in CPython is a much harder topic, since it also requires tackling the problem of multi-threaded reference counting. See the section below for further details.

As we announced at EuroPython, what we have so far is a GIL-less PyPy which can run very simple multi-threaded, nicely parallelized, programs. At the moment, more complicated programs probably segfault. The remaining 90% (and another 90%) of work is with putting locks in strategic places so PyPy does not segfault during concurrent accesses to data structures.

Since such work would complicate the PyPy code base and our day-to-day work, we would like to judge the interest of the community and the commercial partners to make it happen (we are not looking for individual donations at this point). We estimate a total cost of $50k, out of which we already have backing for about 1/3 (with a possible 1/3 extra from the STM money, see below). This would give us a good shot at delivering a good proof-of-concept working PyPy with no GIL. If we can get a $100k contract, we will deliver a fully working PyPy interpreter with no GIL as a release, possibly separate from the default PyPy release.

People asked several questions, so I'll try to answer the technical parts here.

What would the plan entail?

We've already done the work on the Garbage Collector to allow doing multi- threaded programs in RPython. "All" that is left is adding locks on mutable data structures everywhere in the PyPy codebase. Since it would significantly complicate our workflow, we require real interest in that topic, backed up by commercial contracts in order to justify the added maintenance burden.

Why did the STM effort not work out?

STM was a research project that proved that the idea is possible. However, the amount of user effort that is required to make programs run in a parallelizable way is significant, and we never managed to develop tools that would help in doing so. At the moment we're not sure if more work spent on tooling would improve the situation or if the whole idea is really doomed. The approach also ended up adding significant overhead on single threaded programs, so in the end it is very easy to make your programs slower. (We have some money left in the donation pot for STM which we are not using; according to the rules, we could declare the STM attempt failed and channel that money towards the present GIL removal proposal.)

Wouldn't subinterpreters be a better idea?

Python is a very mutable language - there are tons of mutable state and basic objects (classes, functions,...) that are compile-time in other language but runtime and fully mutable in Python. In the end, sharing things between subinterpreters would be restricted to basic immutable data structures, which defeats the point. Subinterpreters suffers from the same problems as multiprocessing with no additional benefits. We believe that reducing mutability to implement subinterpreters is not viable without seriously impacting the semantics of the language (a conclusion which applies to many other approaches too).

Why is it easier to do in PyPy than CPython?

Removing the GIL in CPython has two problems:

  • how do we guard access to mutable data structures with locks and
  • what to do with reference counting that needs to be guarded.

PyPy only has the former problem; the latter doesn't exist, due to a different garbage collector approach. Of course the first problem is a mess too, but at least we are already half-way there. Compared to Jython or IronPython, PyPy lacks some data structures that are provided by JVM or .NET, which we would need to implement, hence the problem is a little harder than on an existing multithreaded platform. However, there is good research and we know how that problem can be solved.

Best regards,
Maciej Fijalkowski


August 14, 2017 03:34 PM


Doug Hellmann

statistics — Statistical Calculations — PyMOTW 3

The statistics module implements many common statistical formulas for efficient calculations using Python’s various numerical types ( int , float , Decimal , and Fraction ). Read more… This post is part of the Python Module of the Week series for Python 3. See PyMOTW.com for more articles from the series.

August 14, 2017 01:00 PM


Mike Driscoll

PyDev of the Week: Brian E. Granger

This week we welcome Brian E. Granger (@ellisonbg) as our PyDev of the Week! Brian is an early core contributor of the IPython Notebook and now leads the Project Jupyter Notebook team. He is also an Associate Professor of Physics and Data Science at California Polytechnic State University. You can also check out what projects he is working on over at Github. Let’s take a few moments to get to know Brian better!

Can you tell us a little about yourself (hobbies, education, etc):

I am going to start with the fun stuff. Since high school I have been playing the guitar, swimming and meditating. It is hard to be disciplined, but I couldn’t survive without a regular practice of these things. Doing intellectual work, such as coding, for long periods of time (decades) is really taxing on the mind, and that spills over to the body. I truly love coding, but these other things are the biggest reason I am still coding productively at 45.

In some ways, I look like a pretty traditional academic, with a Ph.D. in theoretical physics from the University of Colorado, Boulder, followed by a postdoc and now a tenured faculty position in the Physics Department at Cal Poly San Luis Obispo.

Along the way, I started building open-source software and that has slowly overtaken my entire professional life. Fernando Pérez (IPython’s creator) and I were classmates in graduate school; I began working on IPython around 2005. Fernando remains a dear friend and the best collaborator I could ever ask for. The vision for the IPython/Jupyter notebook came out of a late night discussion over ice cream with him in 2004. It took us until 2011 to ship the original IPython Notebook. Since then my main research focus has been on Project Jupyter and other open-source tools for data science and scientific computing.

Why did you start using Python?

I first used Python as a postdoc in 2003. My first Python program used VPython to simulate and visualize traffic flow. I had written a previous version of the simulation using C++, and couldn’t believe how Python enabled me to spend more time thinking about the physics and less about the code. Within a short period of time, I couldn’t bring myself to keep working in C++ for scientific work.

What other programming languages do you know and which is your favorite?

I used Mathematica in my physics research during the 1990’s. During graduate school and as a postdoc, I worked in C++. At the time, C++ was still pretty painful. I don’t miss that, but modern C++ actually looks quite nice.

Python remains my favorite language, mainly because it is so much fun and has an amazing community.

At the same time, these days I am doing a lot of frontend development for JupyterLab in TypeScript. For a large project with many contributors, having static type checking is revolutionary. TypeScript looks a lot like Python 3’s type annotations, and I can’t wait to begin using Python with static type checking.

What projects are you working on now?

Jupyter and IPython continue to take up most of my time. On that side of things I am working hard with the rest of the JupyterLab team to get the first version of JupyterLab released this summer.

In 2016, Jake VanderPlas and I started Altair, which is a statistical visualization package for Python based on Vega/Vega-Lite from Jeff Heer’s Interactive Data Lab at the University of Washington. While I spend less time on Altair, it, along with Vega/Vega-Lite are a critical part of the overall data ecosystem we are building for Jupyter users.

Which Python libraries are your favorite (core or 3rd party)?

Wow, there are so many. Pandas brought Python to the data world. I love the API design of the libraries that Matt Rocklin has built (Dask, multipledispatch, toolz). In spite of healthy competition from all the new JavaScript based visualization libraries, Matplotlib remains indispensable.

Where do you see Python going as a programming language?

It won’t be long before we are all writing statically type-checked Python 3 code 😉

Is there anything else you’d like to say?

A huge thanks to everyone, users and developers, in the Python community. It is a great blessing to work alongside all of you!

Thanks so much!

 

August 14, 2017 12:30 PM

August 13, 2017


Sandipan Dey

Dogs vs. Cats: Image Classification with Deep Learning using TensorFlow in Python

The problem Given a set of labeled images of  cats and dogs, a  machine learning model  is to be learnt and later it is to be used to classify a set of new images as cats or dogs. This problem appeared in a Kaggle competition and the images are taken from this kaggle dataset. The original dataset … Continue reading Dogs vs. Cats: Image Classification with Deep Learning using TensorFlow in Python

August 13, 2017 10:28 PM