skip to navigation
skip to content

Planet Python

Last update: August 22, 2017 01:48 PM

August 22, 2017


Talk Python to Me

#126 Kubernetes for Pythonistas

Containers are revolutionizing the way we deploy and manage applications. These containers allow us to build, develop, test, and even deploy on the exact same system. We can build layered systems that fill in our dependencies. They even can play a crucial role in zero-downtime upgrades. <br/> <br/> This is great, until you end up with 5 different types of containers, each of them scaled out, and you need to get them to work together, discover each other and upgrade together. That's where Kubernetes comes it. <br/> <br/> Today you'll meet Kelsey Hightower, a developer advocate on Google's cloud platform.<br/> <br/> Links from the show:<br/> <br/> <div style="font-size: .85em;"><b>Kelsey on Twitter</b>: <a href="https://twitter.com/kelseyhightower" target="_blank">@kelseyhightower</a><br/> <b>Kelsey's PyCon Keynote</b>: <a href="https://www.youtube.com/watch?v=u_iAXzy3xBA" target="_blank">youtube.com/watch?v=u_iAXzy3xBA</a><br/> <b>Kubernetes</b>: <a href="https://kubernetes.io/" target="_blank">kubernetes.io</a><br/> <b>Kubernetes on GitHub</b>: <a href="https://github.com/kubernetes/kubernetes" target="_blank">github.com/kubernetes</a><br/> <b>FREE COURSE Scalable Microservices with Kubernetes by Google</b>: <a href="https://www.udacity.com/course/scalable-microservices-with-kubernetes--ud615" target="_blank">udacity.com/course/scalable-microservices-with-kubernetes--ud615</a><br/> <br/> <b>Classic Programmer Paintings</b>: <a href="http://classicprogrammerpaintings.com/" target="_blank">classicprogrammerpaintings.com</a><br/></div>

August 22, 2017 08:00 AM


Gocept Weblog

Zope preparing to enter Python 3 wonderland

Once upon the time there was an earl named Zope II. His prophets told him that around the year 2020 suddenly his peaceful country will be devastated: They proclaim that with the “sunset” of  Python 2 as stable pillar of his country, insecurity and pain will invade his borders and hurt everyone living within. There seemed only one possible move forward to escape the disaster: Flee to the Python 3 wonderland, the source of peace and prosperity.

This was not as easy as one might think. Earl Zope II was already an old man. He was in the stable age where changes are no longer easy to achieve and he had many courtiers in his staff which he needed all the day.

The immigration authority of the Python 3 wonderland was very picky about the persons which requested permission to settle down. Many “updates” for Zope II and his staff where required to so they eventually became “compatible” with the new country. Earl Zope II was even forced to change his name to Zope IV to show hat he was ready for Python 3 wonderland.

After much work with the immigration authorities it seemed to be possible for earl Zope IV to enter; only some – but important – formalities were needed before he could be allowed to settle down and call himself a citizen of the Python 3 wonderland.

This is where the tale gets real: We need your help to release a beta version of Zope 4. The hard work seems to be done; but some polish and testing is still required to reach this goal.

We invite you to the Zope 4 Phoenix Sprint to help raising Zope 4 from the ashes! From Wednesday, 13th until Friday, 15th of September 2017 we sprint at the gocept office in Halle (Saale), Germany towards the beta release.

Possible sprint topics could be:

You are heartily invited to join us for the honour of earl Zope IV.


August 22, 2017 07:37 AM


Anwesha Das

The mistakes I did in my blog posts

Today we will be discussing the mistakes I did with my blog posts.
I started (seriously) writing blogs a year back. A few of my posts got a pretty nice response. The praise put me in seventh heaven. I thought I was a fairly good blogger.But after almost a year of writing, one day I chanced upon one of my older posts and reading it sent me crashing down to earth.

There was huge list of mistakes I made

The post was a perfect example of TLDR. I previously used to judge a post based on quantity. The larger the number of words, the better! (Typical lawyer mentality!)

The title and the lead paragraph were vague.

The sentences were long (far too long).

There were plenty grammatical mistakes.

I lost the flow of thought, broke the logical chain in many places.

The measures I took to solve my problem

I was upset. I stopped writing for a month or so.
After the depressed, dispirited phase was over, I got back up, dusted myself off and tried to find out ways to make be a better writer.

Talks, books, blogs:

I searched for talks, writings, books on “how to write good blog posts” and started reading, and watching videos. I tried to follow those while writing my posts.

Earlier I used to take a lot of time (a week) to write each post. I used to flit from sentence to new sentence. I used to do that so I do not forget the latest idea or next thought that popped into my head.
But that caused two major problems:

First, the long writing time also meant long breaks. The interval broke my chain of thought anyway. I had to start again from the beginning. That resulted in confusing views and non-related sentences.

Secondly, it also caused the huge length of the posts.

Now I dedicate limited time, a few hours, for each post, depending on the idea.
And I strictly adhere to those hours. I use Tomato Timer to keep a check on the time. During that time I do not go to my web browser, check my phone, do any household activity and of course, ignore my husband completely.
But one thing I am not being able to avoid is, “Mamma no working. Let's play” situation. :)
I focus on the sentence I am writing. I do not jump between sentences. I’ve made peace with the fear of losing one thought and I do not disturb the one I am working on. This keeps my ideas clear.

To finish my work within the stipulated time
- I write during quieter hours, especially in the morning, - I plan what to write the day before, - am caffeinated while writing

Sometimes I can not finish it in one go. Then before starting the next day I read what I wrote previously, aloud.

Revision:

Previously after I finished writing, I used to correct only the red underlines. Now I take time and follow four steps before publishing a post:

Respect the readers

This single piece of advice has changed my posts for better.
Respect the reader.
Don’t give them any false hopes or expectations.

With that in mind, I have altered the following two things in my blog:

Vague titles

I always thought out of the box, and figured that sarcastic titles would showcase my intelligence. A off hand, humourous title is good. How utterly wrong I was.

People search by asking relevant question on the topic.
Like for hardware () project with esp8266 using micropython people may search with
- “esp8266 projects” - “projects with micropython” - “fun hardware projects” etc. But no one will search with “mybunny uncle” (it might remind you of your kindly uncle, but definitely not a hardware project in any sense of the term).

People find your blogs by RSS feed or searching in any search engine.
So be as direct as possible. Give a title that describes core of the content. In the words of Cory Doctorow write your headlines as if you are a Wired service writer.

Vague Lead paragraph

Lead paragraph; the opening paragraph of your post must be explanatory of what follows. Many times, the lead paragraph is the part of the search result.

Avoid conjunctions and past participles

I attempt not to use any conjunction, connecting clauses or past participle tense. These make a sentence complicated to read.

Use simple words

I use simple, easy words in contrast to hard, heavy and huge words. It was so difficult to make the lawyer (inside me) understand that - “simple is better than complicated”.

The one thing which is still difficult for me is - to let go. To accept the fact all of my posts will not be great/good.
There will be faults in them, which is fine.
Instead of putting one’s effort to make a single piece better, I’d move on and work on other topics.

August 22, 2017 03:18 AM


Daniel Bader

What Are Python Generators?

What Are Python Generators?

Generators are a tricky subject in Python. With this tutorial you’ll make the leap from class-based iterators to using generator functions and the “yield” statement in no time.

Python Generators Tutorial

If you’ve ever implemented a class-based iterator from scratch in Python, you know that this endeavour requires writing quite a bit of boilerplate code.

And yet, iterators are so useful in Python: They allow you to write pretty for-in loops and help you make your code more Pythonic and efficient.

As a (proud) “lazy” Python developer, I don’t like tedious and repetitive work. And so, I often found myself wondering:

If there only was a more convenient way to write these Python iterators in the first place…

Surprise, there is! Once again, Python helps us out with some syntactic sugar to make writing iterators easier.

In this tutorial you’ll see how to write Python iterators faster and with less code using generators and the yield keyword.

Ready? Let’s go!

Python Generators 101 – The Basics

Let’s start by looking again at the Repeater example that I previously used to introduce the idea of iterators. It implemented a class-based iterator cycling through an infinite sequence of values.

This is what the class looked like in its second (simplified) version:

class Repeater:
    def __init__(self, value):
        self.value = value

    def __iter__(self):
        return self

    def __next__(self):
        return self.value

If you’re thinking, “that’s quite a lot of code for such a simple iterator,” you’re absolutely right. Parts of this class seem rather formulaic, as if they would be written in exactly the same way from one class-based iterator to the next.

This is where Python’s generators enter the scene. If I rewrite this iterator class as a generator, it looks like this:

def repeater(value):
    while True:
        yield value

We just went from seven lines of code to three.

Not bad, eh? As you can see, generators look like regular functions but instead of using the return statement, they use yield to pass data back to the caller.

Will this new generator implementation still work the same way as our class-based iterator did? Let’s bust out the for-in loop test to find out:

>>> for x in repeater('Hi'):
...    print(x)
'Hi'
'Hi'
'Hi'
'Hi'
'Hi'
...

Yep! We’re still looping through our greetings forever. This much shorter generator implementation seems to perform the same way that the Repeater class did.

(Remember to hit Ctrl+C if you want out of the infinite loop in an interpreter session.)

Now, how do these generators work? They look like normal functions, but their behavior is quite different. For starters, calling a generator function doesn’t even run the function. It merely creates and returns a generator object:

>>> repeater('Hey')
<generator object repeater at 0x107bcdbf8>

The code in the generator function only executes when next() is called on the generator object:

>>> generator_obj = repeater('Hey')
>>> next(generator_obj)
'Hey'

If you read the code of the repeater function again, it looks like the yield keyword in there somehow stops this generator function in mid-execution and then resumes it at a later point in time:

def repeater(value):
    while True:
        yield value

And that’s quite a fitting mental model for what happens here. You see, when a return statement is invoked inside a function, it permanently passes control back to the caller of the function. When a yield is invoked, it also passes control back to the caller of the function—but it only does so temporarily.

Whereas a return statement disposes of a function’s local state, a yield statement suspends the function and retains its local state.

In practical terms, this means local variables and the execution state of the generator function are only stashed away temporarily and not thrown out completely.

Execution can be resumed at any time by calling next() on the generator:

>>> iterator = repeater('Hi')
>>> next(iterator)
'Hi'
>>> next(iterator)
'Hi'
>>> next(iterator)
'Hi'

This makes generators fully compatible with the iterator protocol. For this reason, I like to think of them primarily as syntactic sugar for implementing iterators.

You’ll find that for most types of iterators, writing a generator function will be easier and more readable than defining a long-winded class-based iterator.

Python Generators That Stop Generating

In this tutorial we started out by writing an infinite generator once again. By now you’re probably wondering how to write a generator that stops producing values after a while, instead of going on and on forever.

Remember, in our class-based iterator we were able to signal the end of iteration by manually raising a StopIteration exception. Because generators are fully compatible with class-based iterators, that’s still what happens behind the scenes.

Thankfully, as programmers we get to work with a nicer interface this time around. Generators stop generating values as soon as control flow returns from the generator function by any means other than a yield statement. This means you no longer have to worry about raising StopIteration at all!

Here’s an example:

def repeat_three_times(value):
    yield value
    yield value
    yield value

Notice how this generator function doesn’t include any kind of loop. In fact it’s dead simple and only consists of three yield statements. If a yield temporarily suspends execution of the function and passes back a value to the caller, what will happen when we reach the end of this generator?

Let’s find out:

>>> for x in repeat_three_times('Hey there'):
...     print(x)
'Hey there'
'Hey there'
'Hey there'

As you may have expected, this generator stopped producing new values after three iterations. We can assume that it did so by raising a StopIteration exception when execution reached the end of the function.

But to be sure, let’s confirm that with another experiment:

>>> iterator = repeat_three_times('Hey there')
>>> next(iterator)
'Hey there'
>>> next(iterator)
'Hey there'
>>> next(iterator)
'Hey there'
>>> next(iterator)
StopIteration
>>> next(iterator)
StopIteration

This iterator behaved just like we expected. As soon as we reach the end of the generator function, it keeps raising StopIteration to signal that it has no more values to provide.

Let’s come back to another example from my Python iterators tutorials. The BoundedIterator class implemented an iterator that would only repeat a value a set number of times:

class BoundedRepeater:
    def __init__(self, value, max_repeats):
        self.value = value
        self.max_repeats = max_repeats
        self.count = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.count >= self.max_repeats:
            raise StopIteration
        self.count += 1
        return self.value

Why don’t we try to re-implement this BoundedRepeater class as a generator function. Here’s my first take on it:

def bounded_repeater(value, max_repeats):
    count = 0
    while True:
        if count >= max_repeats:
            return
        count += 1
        yield value

I intentionally made the while loop in this function a little unwieldy. I wanted to demonstrate how invoking a return statement from a generator causes iteration to stop with a StopIteration exception. We’ll soon clean up and simplify this generator function some more, but first let’s try out what we’ve got so far:

>>> for x in bounded_repeater('Hi', 4):
...     print(x)
'Hi'
'Hi'
'Hi'
'Hi'

Great! Now we have a generator that stops producing values after a configurable number of repetitions. It uses the yield statement to pass back values until it finally hits the return statement and iteration stops.

Like I promised you, we can further simplify this generator. We’ll take advantage of the fact that Python adds an implicit return None statement to the end of every function. This is what our final implementation looks like:

def bounded_repeater(value, max_repeats):
    for i in range(max_repeats):
        yield value

Feel free to confirm that this simplified generator still works the same way. All things considered, we went from a 12-line iterator in the BoundedRepeater class to a three-line generator-based implementation providing the same functionality.

That’s a 75% reduction in the number of lines of code—not too shabby!

Generator functions are a great feature in Python, and you shouldn’t hesitate to use them in your own programs.

As you just saw, generators help you “abstract away” most of the boilerplate code otherwise needed when writing class-based iterators. Generators can make your life as a Pythonista much easier and allow you to write cleaner, shorter, and more maintainable iterators.

Python Generators – A Quick Summary

August 22, 2017 12:00 AM

August 21, 2017


Doug Hellmann

smtpd — Sample Mail Servers — PyMOTW 3

The smtpd module includes classes for building simple mail transport protocol servers. It is the server-side of the protocol used by smtplib . Read more… This post is part of the Python Module of the Week series for Python 3. See PyMOTW.com for more articles from the series.

August 21, 2017 01:00 PM


Mike Driscoll

PyDev of the Week: Katherine Scott

This week we welcome Katherine Scott (@kscottz) as our PyDev of the Week! Katherine was was the lead developer of the SimpleCV computer vision library and co-author of the SimpleCV O’Reilly Book. You can check out Katherine’s open source projects over on Github. Let’s take a few moments to get to know her better!

Can you tell us a little about yourself (hobbies, education, etc):

A quick summary about me:

I am currently the image analytics team lead at Planet Labs. Planet is one of the largest satellite imaging companies in the world and my team helps take Planet’s daily satellite imagery and turn into actionable information. We currently image the entire planet every day at ~3m resolution and not only do I get to see that data, but I also have the resources to apply my computer vision skills to our whole data set. On top of this I get to work stuff in space! It goes without saying that I absolutely love my job. I am also on the board of the Open Source Hardware Association and I help put together the Open Hardware Summit.

Prior to working at Planet i co-founded two success start-up Tempo Automation and SightMachine. Prior to founding those two start-ups I worked at really awesome research and development company called Cybernet Systems. While I was at Cybernet I did computer vision, augmented reality, and robotics research.

Education:
I graduated from the University of Michigan in 2005 with dual degrees in computer engineering and electrical engineering. To put myself through school I worked as a research assistant with a couple of really awesome labs where I did research on MEMS neural prosthetics and the RHex Robot (a cousin to the Big Dog robot you may be familiar with). In 2010 I decided to go back to school to get my masters degree at Columbia University. I majored in computer science with a focus on computer vision and robotics. It was at the tail end of grad school that I got bit by the start-up bug and helped start Sight Machine.

Hobbies:
My hobbies are currently constrained by my tiny apartment in San Francisco, but I like to build and make stuff (art, hardware, software, etc) in my spare time. I am also really into music so I go to a lot of live shows. As I’ve gotten older I’ve found that I need to exercise if I want to stay in front of a screen so I like to walk, bike, and do pilates. I am also the owner of three pet rats. I started keeping rats after working with them in the lab during college.

Why did you start using Python?

I was almost exclusively a C/C++/C# user for the first ten years I was an engineer. There was some Lua and Java mixed in here and there but I spent 90% of my time writing C++ from scratch. When I started at SightMachine I switched over to Python to help build a computer vision library called SimpleCV for the company. I fell in love almost immediately. Python allowed me to abstract away a lot of the compiler, linker, and memory management related tasks and focus more on computer vision algorithm development. The sheer volume of scientific and mathematical libraries was also a fantastic resource.

What other programming languages do you know and which is your favorite?
I’ve been a professional engineer now for twelve years so I have basically seen it all and done it all. I’ve done non-trivial projects in C, C++, C#, Java, Javascript and Python and I’ve dabbled using some of the more esoteric languages like lisp, lua, coffee script, and ocaml. Python is my favorite because the “batteries are included.” With so many libraries and packages out there it is like having a super power, if I can dream it up I can code it.

What projects are you working on now?

My job keeps me very busy right now but it is super rewarding as I feel like we are giving everyone on Earth an ability to see the planet in real time. In April Planet released a Kaggle competition that focuses on detecting illegal mining and deforestation in the Amazon. More recently I just wrapped up working on my latest Pycon Talk and putting together the speaker list for Open Hardware Summit. With this stuff out of the way I starting a couple of new projects with some far left activist groups in the Bay Area. We are trying to put together an activist hack-a-thon where we develop tools for Bay Area non-profits. The project I am going to focus on specifically is a tool to systematically mine and analyze the advertising content of hate speech websites in an effort to defund them. These projects are still in the planning stage, but I am hoping to have them up and running by late summer.

Which Python libraries are your favorite (core or 3rd party)?

The whole scientific python community is amazing and I am a huge fan of Project Jupyter. Given my line of work I use OpenCV, Keras, Scikit, Pandas, and Numpy on a daily basis. Now that I am doing GIS work I have been exploring that space quite a bit. Right now I am getting really familiar with GeoPandas, Shapely, GDAL’s python bindings, and libraries the provide interfaces to Open Street Maps just to name a few. I also want to give a big shout out to the Robot Operating System and the Open Source Robotics Foundation.

Is there anything else you’d like to say?

I have a lot of things I could say but most of them would become a rant. I will say I try to make myself available over the internet, particularly to younger engineers just learning their craft. If you have questions about my field or software engineering in general, don’t hesitate to reach out.

Thanks for doing the interview!

August 21, 2017 12:30 PM


PyCharm

PyCharm 2017.2.2 RC is now available

Another set of bugs have been fixed, and PyCharm 2017.2.2 RC is available now from our confluence page.

Improvements in this release:

We’d like to thank our users who have reported these bugs for helping us to resolve them! If you find a bug, please let us know on YouTrack.

If you use Django, but don’t have PyCharm Professional Edition yet, you may be interested to learn about our Django Software Foundation promotion. You can get a 30% discount on PyCharm, and support the Django Software Foundation at the same time.

-PyCharm Team
The Drive to Develop

August 21, 2017 11:09 AM


Catalin George Festila

Using pip into shell to install and use pymunk.

The tutorial for today will show how to use pip into python shell to install a python package.
The first step is show in the next image:

August 21, 2017 06:38 AM


Tomasz Früboes

Look ma, I made a browser game!

I’ve been using python for a while now. First, it was slightly forced at me as a configuration language of large software framework (C++ based) of one of the largest physics experiments to date. Then I’ve implemented data analysis in python that was a basis of my Ph.D. dissertation. Finally, for the last couple of years, I was using python for a wide range of activities that one would call “data science”.

Some while ago I’ve decided that trying something different would be essential for my mental hygiene. As a typical nerd, I didn’t go for rock climbing or piano lessons, but choose to apply python in a different manner. That is to flirt with the web world.

Last time I’ve tried web page programming, netscape browser was still a thing. And in each HTML file created you had to include an animated gif of a digger saying “under construction”. As you can see my webdev skills at that point were somewhat dated…

I knew earlier that python offers a number of web frameworks, with different philosophies behind them. I decided to go for flask, as it is quite popular and seems to be more DIY than django. As an initial project I decided to implement a tic-tac-toe game, with the following goals:

A basic AJAX example

It turns out, basic AJAX part is rather easy to implement in flask. Example below demonstrates basic javascript (browser) – python (server) interaction using AJAX. File structure of this simple experiment is the following:

├── basic
|     ├── __init__.py
│     ├── server.py
│     └── templates
│           ├── index.html
│           └── script.js
├── setup.py

Python part (server.py) looks the following

import datetime
from flask import Flask, send_from_directory, jsonify

app = Flask(__name__)
app.config.from_object(__name__)

@app.route('/')
def index():
    return send_from_directory("templates", "index.html")

@app.route('/<string:name>')
def static_files(name):
    return send_from_directory("templates", name)

@app.route('/hello')
def hello():
    result="Hey, I saw that! You clicked at {}".format(datetime.datetime.now())
    return jsonify(result=result)

Lines 4-5 are needed by flask to create the application. The “@app.route” decorator is a flask way to define what functions should be used to handle given URLs. The first two functions are created to handle standard requests for web page files (e.g. index.html). As a response, predefined files from templates directory are served (note: nothing is generated dynamically). The last function will be our AJAX handler. It simply returns a string with the current time and date, packed as a JSON (which stands for JavaScript Object Notation).

Client (javascript) part is the following:

function receive_hello(data){
    $("#mytarget").html(data.result)
}

function send_hello(jqevent) {
    $.getJSON("/hello", {}, receive_hello)
}

$( document ).ready(function() {
  $( window ).click(send_hello);
});

There are a couple of things happening in the listing above:

Getting the code and running

The source code for the example above (and not yet mentioned tictactoe implementation) can be downloaded in the following way:

git clone https://github.com/fruboes/tictactoe.git

cd tictactoe/
virtualenv venv
source venv/bin/activate

pip install -e basic/
pip install -e tictactoe/

In order to run simply execute the following

cd basic/
./start_basic.sh 

You should see a link in the terminal. After opening it and clicking anywhere on the web page in the browser you should see current date and time printed in the window. Subsequent clicks should update the time.

Da game!

If you followed the setup instructions above, you have already downloaded the source code for the tictactoe implementation. Since this was supposed to be an exercise in web development, the AI part was intentionally left stupid. If you care enough, you may want to experiment and improve it. I won’t go through the sources, since essentially this is an extended version of the basic example above.

In order to run the game, you should execute the tictactoe/start.sh script. Or you may use a running instance at http://tictactoe.pythonanywhere.com/ (as this is a free tier of pythonanywhere, I’ll keep it running for next month).

Lessons learnt along the way

Basic web development with python is rather easy to start. If you are going to try it yourself, the following may be helpful:

August 21, 2017 05:56 AM


Codementor

Getting Started with Scraping in Python

A useful guide to how to get started web scraping using Python.

August 21, 2017 05:13 AM

August 20, 2017


Simple is Better Than Complex

How to Use Celery and RabbitMQ with Django

Celery is an asynchronous task queue based on distributed message passing. Task queues are used as a strategy to distribute the workload between threads/machines. In this tutorial I will explain how to install and setup Celery + RabbitMQ to execute asynchronous in a Django application.

To work with Celery, we also need to install RabbitMQ because Celery requires an external solution to send and receive messages. Those solutions are called message brokers. Currently, Celery supports RabbitMQ, Redis, and Amazon SQS as message broker solutions.

Table of Contents

Why Should I Use Celery?

Web applications works with request and response cycles. When the user access a certain URL of your application the Web browser send a request to your server. Django receive this request and do something with it. Usually it involves executing queries in the database, processing data. While Django does his thing and process the request, the user have to wait. When Django finalize its job processing the request, it sends back a response to the user who finally will see something.

Ideally this request and response cycle should be fast, otherwise we would leave the user waiting for way too long. And even worse, our Web server can only serve a certain number of users at a time. So, if this process is slow, it can limit the amount of pages your application can serve at a time.

For the most part we can work around this issue using cache, optimizing database queries, and so on. But there are some cases that theres no other option: the heavy work have to be done. A report page, export of big amount of data, video/image processing are a few examples of cases where you may want to use Celery.

We don’t use Celery through the whole project, but only for specific tasks that are time-consuming. The idea here is to respond to the user as quick as possible, and pass the time-consuming tasks to the queue so to be executed in the background, and always keep the server ready to respond to new requests.


Installation

The easiest way to install Celery is using pip:

pip install Celery

Now we have to install RabbitMQ.

Installing RabbitMQ on Ubuntu 16.04

To install it on a newer Ubuntu version is very straightforward:

apt-get install -y erlang
apt-get install rabbitmq-server

Then enable and start the RabbitMQ service:

systemctl enable rabbitmq-server
systemctl start rabbitmq-server

Check the status to make sure everything is running smooth:

systemctl status rabbitmq-server
Installing RabbitMQ on Mac

Homebrew is the most straightforward option:

brew install rabbitmq

The RabbitMQ scripts are installed into /usr/local/sbin. You can add it to your .bash_profile or .profile.

vim ~/.bash_profile

Then add it to the bottom of the file:

export PATH=$PATH:/usr/local/sbin

Restart the terminal to make sure the changes are in effect.

Now you can start the RabbitMQ server using the following command:

rabbitmq-server

rabbitmq-server

Installing RabbitMQ on Windows and Other OSs

Unfortunately I don’t have access to a Windows computer to try things out, but you can find the installation guide for Windows on RabbitMQ’s Website.

For other operating systems, check the Downloading and Installing RabbitMQ on their Website.


Celery Basic Setup

First, consider the following Django project named mysite with an app named core:

mysite/
 |-- mysite/
 |    |-- core/
 |    |    |-- migrations/
 |    |    |-- templates/
 |    |    |-- apps.py
 |    |    |-- models.py
 |    |    +-- views.py
 |    |-- templates/
 |    |-- __init__.py
 |    |-- settings.py
 |    |-- urls.py
 |    +-- wsgi.py
 |-- manage.py
 +-- requirements.txt

Add the CELERY_BROKER_URL configuration to the settings.py file:

settings.py

CELERY_BROKER_URL = 'amqp://localhost'

Alongside with the settings.py and urls.py files, let’s create a new file named celery.py.

celery.py

import os
from celery import Celery

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'mysite.settings')

app = Celery('mysite')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()

Now edit the __init__.py file in the project root:

__init__.py

from .celery import app as celery_app

__all__ = ['celery_app']

This will make sure our Celery app is important every time Django starts.


Creating Our First Celery Task

We can create a file named tasks.py inside a Django app and put all our Celery tasks into this file. The Celery app we created in the project root will collect all tasks defined across all Django apps listed in the INSTALLED_APPS configuration.

Just for testing purpose, let’s create a Celery task that generates a number of random User accounts.

core/tasks.py

import string

from django.contrib.auth.models import User
from django.utils.crypto import get_random_string

from celery import shared_task

@shared_task
def create_random_user_accounts(total):
    for i in range(total):
        username = 'user_{}'.format(get_random_string(10, string.ascii_letters))
        email = '{}@example.com'.format(username)
        password = get_random_string(50)
        User.objects.create_user(username=username, email=email, password=password)
    return '{} random users created with success!'.format(total)

The important bits here are:

from celery import shared_task

@shared_task
def name_of_your_function(optional_param):
    pass  # do something heavy

Then I defined a form and a view to process my Celery task:

forms.py

from django import forms
from django.core.validators import MinValueValidator, MaxValueValidator

class GenerateRandomUserForm(forms.Form):
    total = forms.IntegerField(
        validators=[
            MinValueValidator(50),
            MaxValueValidator(500)
        ]
    )

This form expects a positive integer field between 50 and 500. It looks like this:

Generate random users form

Then my view:

views.py

from django.contrib.auth.models import User
from django.contrib import messages
from django.views.generic.edit import FormView
from django.shortcuts import redirect

from .forms import GenerateRandomUserForm
from .tasks import create_random_user_accounts

class GenerateRandomUserView(FormView):
    template_name = 'core/generate_random_users.html'
    form_class = GenerateRandomUserForm

    def form_valid(self, form):
        total = form.cleaned_data.get('total')
        create_random_user_accounts.delay(total)
        messages.success(self.request, 'We are generating your random users! Wait a moment and refresh this page.')
        return redirect('users_list')

The important bit is here:

create_random_user_accounts.delay(total)

Instead of calling the create_random_user_accounts directly, I’m calling create_random_user_accounts.delay(). This way we are instructing Celery to execute this function in the background.

Then Django keep processing my view GenerateRandomUserView and returns smoothly to the user.

But before you try it, check the next section to learn how to start the Celery worker process.


Starting The Worker Process

Open a new terminal tab, and run the following command:

celery -A mysite worker -l info

Change mysite to the name of your project. The result is something like this:

Celery Worker

Now we can test it. I submitted 500 in my form to create 500 random users.

The response is immediate:

Random

Meanwhile, checking the Celery Worker Process:

[2017-08-20 19:11:17,485: INFO/MainProcess] Received task:
mysite.core.tasks.create_random_user_accounts[8799cfbd-deae-41aa-afac-95ed4cc859b0]

Then after a few seconds, if we refresh the page, the users are there:

Random

If we check the Celery Worker Process again, we can see it completed the execution:

[2017-08-20 19:11:45,721: INFO/ForkPoolWorker-2] Task
mysite.core.tasks.create_random_user_accounts[8799cfbd-deae-41aa-afac-95ed4cc859b0] succeeded in
28.225658523035236s: '500 random users created with success!'

Managing The Worker Process in Production with Supervisord

If you are deploying your application to a VPS like DigitalOcean you will want to run the worker process in the background. In my tutorials I like to use Supervisord to manage the Gunicorn workers, so it’s usually a nice fit with Celery.

First install it (on Ubuntu):

sudo apt-get install supervisor

Then create a file named mysite-celery.conf in the folder: /etc/supervisor/conf.d/mysite-celery.conf:

[program:mysite-celery]
command=/home/mysite/bin/celery worker -A web --loglevel=INFO
directory=/home/mysite/mysite
user=nobody
numprocs=1
stdout_logfile=/home/mysite/logs/celery.log
stderr_logfile=/home/mysite/logs/celery.log
autostart=true
autorestart=true
startsecs=10

; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600

stopasgroup=true

; Set Celery priority higher than default (999)
; so, if rabbitmq is supervised, it will start first.
priority=1000

In the example below, I’m considering my Django project is inside a virtual environment. The path to my virtual environment is /home/mysite/.

Now reread the configuration and add the new process:

sudo supervisorctl reread
sudo supervisorctl update

If you are not familiar with deploying Django to a production server and working with Supervisord, maybe this part will make more sense if you check this post from the blog: How to Deploy a Django Application to Digital Ocean.


Further Reading

Those are the basic steps. I hope this helped you to get started with Celery. I will leave here a few useful references to keep learning about Celery:

And as usual, the code examples used in this tutorial is available on GitHub:

github.com/sibtc/django-celery-example


If you want to try this setup in a Ubuntu cloud server, you can use this referral link to get a $10 free credit from Digital Ocean.

August 20, 2017 07:54 PM


NumFOCUS

How to Compare Photos of the Solar Eclipse using Python & SunPy

The following post was written by Steven Christe of SunPy. A Rare Opportunity to View the Solar Corona A solar eclipse presents a unique opportunity for us to view the solar corona or solar atmosphere. In visible light, the corona is about one million times fainter than the sun, or about as bright as the moon. […]

August 20, 2017 05:39 PM


Import Python

Import Python 138 - 18th Aug 2017

Worthy Read

In this Python Programming Tutorial, we will be learning how to unit-test our code using the unittest module. Unit testing will allow you to be more comfortable with refactoring and knowing whether or not your updates broke any of your existing code. Unit testing is a must on any large projects and is used by all major companies. Not only that, but it will greatly improve your personal code as well. Let's get started.
video

Get started today.
Sponsor

videos
,
multithreading

data science

This web is a online converter that reads Python 2.x source code and applies a series of fixers to transform it into valid Python 3.x code Enter your Python2 code on the left, hit the button, and boom, Python3 code on the right
Python 3

This is a story about how very difficult it is to build concurrent programs. It’s also a story about a bug in Python’s Queue class, a class which happens to be the easiest way to make concurrency simple in Python. This is not a happy story: this is a tragedy, a story of deadlocks and despair.
concurrency

Notebook based off James Powell's talk at PyData 2017.
core-python

F-strings provide a concise and convenient way to embed python expressions inside string literals for formatting.
f-strings

Reddit Discussion
pypi

Roughly, Hy is to Python as Clojure is to Java. Hy completely inter-ops with Python. I've hit commit 1,500 in my Hy project at work. I wanted to share my experience working with Hy, where I feel it shines and where it falls short.
lisp

A "fold" is a fundamental primitive in defining operations on data structures; it's particularly important in functional languages where recursion is the default tool to express repetition. In this article I'll present how left and right folds work and how they map to some fundamental recursive patterns. The article starts with Python, which should be (or at least look) familiar to most programmers. It then switches to Haskell for a discussion of more advanced topics like the connection between folding and laziness, as well as monoids.
haskell

C is relatively difficult to write, making it harder to analyse and test. It would be helpful to be able to do this with a higher level language, such as Python. Analysis and testing don’t affect performance of the actual data structure, so using a slower but easier and more productive language for this seems reasonable. In this article, we walk though a simple example of doing this with a built-in Python library for interfacing with C called ctypes.
c-code

This short post shows how to use Python packages googlemaps and GDAL.
geo

One of the reasons why I love the Python programming language is because of how easy debugging is. You don't need a full blown IDE to be able to debug your Python application. We will go through the process of debugging a simple Python script using the pdb module from the Python standard library, which comes with most installation of Python.
debugging

We’re excited to be launching a bunch of new annotation types for images. Since the launch of our bounding box API, we’ve annotated millions of images with boxes to identify a host of different objects, from cars and hats to roof damage and parking lots. Scale is becoming an industry-standard tool for solving computer vision problems.
image processing

The post is about a terminal visualization tool lehar that is open sourced at https://github.com/darxtrix/lehar
visualization

Today the #DjangoTip will be about using select_related and prefetch_related to improve our queries performance. Note - Django developer do check out django newsletter - http://djangoweekly.com
django


Projects

django-service-objects - 28 Stars, 3 Fork
Service objects for Django

jsonschema2db - 22 Stars, 0 Fork
Generate tables dynamically from a JSON Schema and insert data.

np-to-tf-embeddings-visualiser - 19 Stars, 3 Fork
Quick function to go from a dictionary of sets of (images, labels, feature vectors) to checkpoints that can be opened in Tensorboard.

smspushtx - 16 Stars, 1 Fork
Simple PushTX server to push Bitcoin transactions via SMS.

yamdl - 15 Stars, 1 Fork
ORM-queryable YAML fixtures for Django.

BookBot - 13 Stars, 0 Fork
Reddit book bot.

contract - 4 Stars, 0 Fork
Create API contracts using Python.

slick - 3 Stars, 0 Fork
A native web-based client for Slack.

PyPocketExplore - 1 Stars, 0 Fork
PyPocketExplore is a CLI-based and web-based API to access Pocket Explore data. It can be used to collect data about the most popular Pocket items for different topics.

August 20, 2017 11:39 AM

August 19, 2017


Weekly Python StackOverflow Report

(lxxxvii) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2017-08-19 21:11:51 GMT


  1. Why are reversed and sorted of different types in Python? - [34/2]
  2. Python 2 and 3 're.sub' inconsistency - [13/1]
  3. Structuring python projects without path hacks - [10/3]
  4. Pythonic way to use the second condition in list comprehensions - [9/3]
  5. Summing columns to form a new dataframe - [9/3]
  6. Melting pandas data frame with multiple variable names and multiple value names - [7/3]
  7. Pandas backward fill increment by 12 months - [7/3]
  8. How to plot date data evenly along x-axis? - [6/2]
  9. Namespaces inside class in Python3 - [6/1]
  10. How exactly does Keras take dimension argumentsfor LSTM / time series problems? - [6/1]

August 19, 2017 09:12 PM


Catalin George Festila

The Google Cloud SDK - part 002 .

The next part of my tutorials about the Google Cloud SDK come with some infos about the project.
As you know I used the default sample appengine hello word standard application.
The goal is to understand how it works by working with Google's documentation and examples.
Into this project folder we have this files:

08/17/2017  11:12 PM                98 app.yaml
08/17/2017 11:12 PM 854 main.py
08/17/2017 11:12 PM 817 main_test.py
Let's see what these files contain:
First is app.yaml and come with:
runtime: python27
api_version: 1
threadsafe: true

handlers:
- url: /.*
script: main.app
The next is main.py file:
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import webapp2


class MainPage(webapp2.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/plain'
self.response.write('Hello, World!')


app = webapp2.WSGIApplication([
('/', MainPage),
], debug=True)
The last from this folder is main_test.py :
# Copyright 2016 Google Inc. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import webtest

import main


def test_get():
app = webtest.TestApp(main.app)

response = app.get('/')

assert response.status_int == 200
assert response.body == 'Hello, World!'
The app.yaml file is used to configure your App Engine application's settings of the project.
You can have many application-level configuration files (dispatch.yaml, cron.yaml, index.yaml, and queue.yaml).
This all type of configuration files are included in the top level app directory ( in this case : hello_world).
Let's see some common gcloud commands:
Let's test some changes:
First , change the text from main.py file with something else:
self.response.write('Hello, World!')
Now use this commands:
C:\Python27\python-docs-samples\appengine\standard\hello_world>gcloud app deploy
C:\Python27\python-docs-samples\appengine\standard\hello_world>gcloud app browse
The result is show into your browser.
You can read about this files into google documentation page - here.
Also some gcloud commands and reference you can read here.

August 19, 2017 12:47 PM

August 18, 2017


Simple is Better Than Complex

How to Render Django Form Manually

Dealing with user input is a very common task in any Web application or Web site. The standard way to do it is through HTML forms, where the user input some data, submit it to the server, and then the server does something with it. Now, the chances are that you might have already heard that quote: “All input is evil!” I don’t know who said that first, but it was very well said. Truth is, every input in your application is a door, a potential attack vector. So you better secure all doors! To make your life easier, and to give you some peace of mind, Django offers a very rich, reliable and secure forms API. And you should definitely use it, no matter how simple your HTML form is.

Managing user input, form processing is a fairly complex task, because it involves interacting with many layers of your application. It have to access the database; clean, validate, transform, and guarantee the integrity of the data; sometimes it needs to interact with multiple models, communicate human readable error messages, and then finally it also have to translate all the Python code that represents your models into HTML inputs. In some cases, those HTML inputs may involve JavaScript and CSS code (a custom date picker, or an auto-complete field for example).

The thing is, Django does very well the server-side part. But it doesn’t mess much with the client-side part. The HTML forms automatically generated by Django is fully functional and can be used as it is. But it’s very crude, it’s just plain HTML, no CSS and no JavaScripts. It was done that way so you can have total control on how to present the forms so to match with your application’s Web design. On the server-side is a little bit different, as thing are more standardized, so most of the functionalities offered by the forms API works out-of-the-box. And for the special cases, it provide many ways to customize it.

In this tutorial I will show you how to work with the rendering part, using custom CSS and making your forms prettier.

Here is the table of contents of this article:


Working Example

Throughout the whole tutorial I will be using the following form definition to illustrate the examples:

forms.py

from django import forms

class ContactForm(forms.Form):
    name = forms.CharField(max_length=30)
    email = forms.EmailField(max_length=254)
    message = forms.CharField(
        max_length=2000,
        widget=forms.Textarea(),
        help_text='Write here your message!'
    )
    source = forms.CharField(       # A hidden input for internal use
        max_length=50,              # tell from which page the user sent the message
        widget=forms.HiddenInput()
    )

    def clean(self):
        cleaned_data = super(ContactForm, self).clean()
        name = cleaned_data.get('name')
        email = cleaned_data.get('email')
        message = cleaned_data.get('message')
        if not name and not email and not message:
            raise forms.ValidationError('You have to write something!')

And the following view just to load the form and trigger the validation process so we can have the form in different states:

views.py

from django.shortcuts import render
from .forms import ContactForm

def home(request):
    if request.method == 'POST':
        form = ContactForm(request.POST)
        if form.is_valid():
            pass  # does nothing, just trigger the validation
    else:
        form = ContactForm()
    return render(request, 'home.html', {'form': form})

Understanding the Rendering Process

In many tutorials or in the official Django documentation, it’s very common to see form templates like this:

<form method="post" novalidate>
  {% csrf_token %}
  {{ form }}
  <button type="submit">Submit</button>
</form>
Note: Maybe you are wondering about the novalidate attribute in the form. In a real case you probably won't want to use it. It prevents the browser from "validating" the data before submitting to the server. As in the examples we are going to explore I only have "required" field errors, it would prevent us from seeing the server-side actual data validation and exploring the error states of the form rendering.

It looks like magic, right? Because this particular form may contain 50 fields, and the simple command {{ form }} will render them all in the template.

When we write {{ form }} in a template, it’s actually accessing the __str__ method from the BaseForm class. The __str__ method is used to provide a string representation of an object. If you have a look in the source code, you will see that it returns the as_table() method. So, basically {{ form }} and {{ form.as_table }} is the same thing.

The forms API offer three methods to automatically render the HTML form:

They all work more or less in the same way, the difference is the HTML code that wraps the inputs.

Below is the result of the previous code snippet:

Contact Form

But, if {{ form }} and {{ form.as_table }} is the same thing, the output definitively doesn’t look like a table, right? That’s because the as_table() and as_ul() doesn’t create the <table> and the <ul> tags, so we have to add it by ourselves.

So, the correct way to do it would be:

<form method="post" novalidate>
  {% csrf_token %}
  <table border="1">
    {{ form }}
  </table>
  <button type="submit">Submit</button>
</form>

Contact Form

Now it makes sense right? Without the <table> tag the browser doesn’t really know how to render the HTML output, so it just present all the visible fields in line, as we don’t have any CSS yet.

If you have a look in the _html_output private method defined in the BaseForm, which is used by all the as_*() methods, you will see that it’s a fairly complex method with 76 lines of code and it does lots of things. It’s okay because this method is well tested and it’s part of the core of the forms API, the underlying mechanics that make things work. When working on your own form rendering logic you won’t need to write Python code to do the job. It’s much better to do it using the Django Templates engine, as you can achieve a more clean and easier to maintain code.

I’m mentioning the _html_output method here because we can use it to analyze what kind of code it’s generating, what it’s really doing, so we can mimic it using the template engine. It’s also a very good exercise to read the source code and get more comfortable with it. It’s a great source of information. Even though Django’s documentation is very detailed and extensive, there are always some hidden bits here and there. You also get the chance to see by examples how smart coders solved specific problems. After all, it’s an open source project with a mature development process that many have contributed, so the chances are you are reading an optimal code.

Anyway, here it is, in a nutshell, what the _html_output does:

Here is what the second state of the form looks like, triggering all the validation errors:

Contact Form With Errors

Now that we know what it’s doing, we can try to mimic the same behavior using the template engine. This way, we will have much more control over the rendering process:

<form method="post" novalidate>
  {% csrf_token %}

  {{ form.non_field_errors }}

  {% for hidden_field in form.hidden_fields %}
    {{ hidden_field.errors }}
    {{ hidden_field }}
  {% endfor %}

  <table border="1">
    {% for field in form.visible_fields %}
      <tr>
        <th>{{ field.label_tag }}</th>
        <td>
          {{ field.errors }}
          {{ field }}
          {{ field.help_text }}
        </td>
      </tr>
    {% endfor %}
  </table>

  <button type="submit">Submit</button>
</form>

You will notice that the result is slightly different, but all the elements are there. The thing is, the automatic generation of the HTML just using the {{ form }} takes advantage of the Python language, so it can play with string concatenation, joining lists (non field errors + hidden field errors), and this sort of things. The template engine is more limited and restrict, but that’s not an issue. I like the Django Template engine because it doesn’t let you do much code logic in the template.

Contact Form With Errors

The only real issue is the random “This field is required” on the top, which refers to the source field. But we can improve that. Let’s keep expanding the form rendering, so we can even get more control over it:

<form method="post" novalidate>
  {% csrf_token %}

  {% if form.non_field_errors %}
    <ul>
      {% for error in form.non_field_errors %}
        <li>{{ error }}</li>
      {% endfor %}
    </ul>
  {% endif %}

  {% for hidden_field in form.hidden_fields %}
    {% if hidden_field.errors %}
      <ul>
        {% for error in hidden_field.errors %}
          <li>(Hidden field {{ hidden_field.name }}) {{ error }}</li>
        {% endfor %}
      </ul>
    {% endif %}
    {{ hidden_field }}
  {% endfor %}

  <table border="1">
    {% for field in form.visible_fields %}
      <tr>
        <th>{{ field.label_tag }}</th>
        <td>
          {% if field.errors %}
            <ul>
              {% for error in field.errors %}
                <li>{{ error }}</li>
              {% endfor %}
            </ul>
          {% endif %}
          {{ field }}
          {% if field.help_text %}
            <br />{{ field.help_text }}
          {% endif %}
        </td>
      </tr>
    {% endfor %}
  </table>

  <button type="submit">Submit</button>
</form>

Contact Form With Errors

Much closer right?

Now that we know how to “expand” the {{ form }} markup, let’s try to make it look prettier. Perhaps using the Bootstrap 4 library.


Accessing the Form Fields Individually

We don’t need a for loop to expose the form fields. But it’s a very convenient way to do it, specially if you don’t have any special requirements for the elements positioning.

Here is how we can refer to the form fields one by one:

<form method="post" novalidate>
  {% csrf_token %}

  {{ form.non_field_errors }}

  {{ form.source.errors }}
  {{ form.source }}

  <table border="1">

      <tr>
        <th>{{ form.name.label_tag }}</th>
        <td>
          {{ form.name.errors }}
          {{ form.name }}
        </td>
      </tr>

      <tr>
        <th>{{ form.email.label_tag }}</th>
        <td>
          {{ form.email.errors }}
          {{ form.email }}
        </td>
      </tr>

      <tr>
        <th>{{ form.message.label_tag }}</th>
        <td>
          {{ form.message.errors }}
          {{ form.message }}
          <br />
          {{ form.message.help_text }}
        </td>
      </tr>

  </table>

  <button type="submit">Submit</button>
</form>

It’s not a very DRY solution. But it’s good to know how to do it. Sometimes you may have a very specific use case that you will need to position the fields in the HTML by yourself.


Expanding the Form Fields

We can still dig deeper and expand the {{ field }} markup (or if you are doing it individually, it would be the {{ form.name }} or {{ form.email }} fields for example). But now things get a little bit more complex, because we are talking about the widgets. For example, the name field translates into a <input type="text"> tag, while the email field translates into a <input type="email"> tag, and even more problematic, the message field translates into a <textarea></textarea> tag.

At this point, Django makes use of small HTML templates to generate the output HTML of the fields.

So let’s see how Django does it. If we open the text.html or the email.html templates from the widgets folder, we will see it simply includes the input.html template file:

{% include "django/forms/widgets/input.html" %}

This suggests the input.html template is probably the most generic one, the specifics of the rendering might be inside it. So, let’s have a look:

<input type="{{ widget.type }}"
       name="{{ widget.name }}"
       {% if widget.value != None %} value="{{ widget.value|stringformat:'s' }}"{% endif %}
       {% include "django/forms/widgets/attrs.html" %} />

Basically this small template sets the input type, it’s name which is used to access the data in the request object. For example, an input with name “message”, if posted to the server, is accessible via request.POST['message'].

Still on the input.html template snippet, it also sets the current value of the field, or leave it empty if there is no data. It’s an important bit in the template, because that’s what keeps the state of the form after it’s submitted and wasn’t successfully processed (form was invalid).

Finally, it includes the attrs.html template, which is responsible for setting attributes such as maxlength, required, placeholder, style, or any other HTML attribute. It’s highly customizable in the form definition.

If you are curious about the attrs.html, here is what it looks like:

{% for name, value in widget.attrs.items %}
  {% if value is not False %}
    {{ name }}{% if value is not True %}="{{ value|stringformat:'s' }}"{% endif %}
  {% endif %}
{% endfor %}

Now, if you really want to create the inputs by yourself, you can do it like this (just the name field, for brevity):

<input type="text"
       name="name"
       id="id_name"
       {% if form.name.value != None %}value="{{ form.name.value|stringformat:'s' }}"{% endif %}
       maxlength="30"
       required>

Or a little bit better:

<input type="text"
       name="{{ form.name.name }}"
       id="{{ form.name.id_for_label }}"
       {% if form.name.value != None %}value="{{ form.name.value|stringformat:'s' }}"{% endif %}
       maxlength="{{ form.name.field.max_length }}"
       {% if form.name.field.required %}required{% endif %}>

Probably you already figured out that’s not the best way to work with forms. And maybe you are also asking yourself why sometimes we refer to a certain attribute as {{ form.name.<something> }} and in other situations we use {{ form.name.field.<something> }}.

I don’t want to go into much detail about it right now, but basically form.name is a BoundField (field + data) instance, and then, the form.name.field is the field definition, which is an instance of forms.CharField. That’s why some values are available in the bound field instance, and others are in the char field definition.

In any form definition, the form’s __iter__ returns a list of BoundField instances, in a similar way, the visible_fields() and hidden_fields() methods also return BoundField instances. Now, if you access the form.fields, it refers to a list of CharField, EmailField, and all other field definitions etc. If that’s too much information for you right now, it’s okay, you don’t have to bother about it right now.


Using Custom HTML Attributes

There are some cases that you only want to add an extra HTML attribute, like a class, a style, or a placeholder. You don’t need to expand the input field like we did in the previous example. You can do it directly in the form definition:

forms.py

class ColorfulContactForm(forms.Form):
    name = forms.CharField(
        max_length=30,
        widget=forms.TextInput(
            attrs={
                'style': 'border-color: blue;',
                'placeholder': 'Write your name here'
            }
        )
    )
    email = forms.EmailField(
        max_length=254,
        widget=forms.EmailInput(attrs={'style': 'border-color: green;'})
    )
    message = forms.CharField(
        max_length=2000,
        widget=forms.Textarea(attrs={'style': 'border-color: orange;'}),
        help_text='Write here your message!'
    )

Colorful Contact Form

Next, we are going to explore a third-party library that can make your life easier.


Using Django Widget Tweaks

Even though we can control the custom HTML attributes in the form definition, it would be much better if we could set them directly in the template. After all, the HTML attributes refer to the presentation of the inputs.

The django-widget-tweaks library is the right tool for the job. It let you keep the form defaults and just add what you need. It’s very convenient, specially when working with ModelForms, as it will reduce the amount of code you have to write to accomplish simple tasks.

I’m not going into much detail about the django-widget-tweaks because I have an article dedicated about it: How to use django-widget-tweaks.

Here’s a quick get started guide:

First, install it using pip:

pip install django-widget-tweaks

Add it to the INSTALLED_APPS:

INSTALLED_APPS = [
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',

    'widget_tweaks',
]

Load it in the template:

{% load widget_tweaks %}
<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>Simple is Better Than Complex</title>
</head>
<body>
  ...
</body>

And we are ready to use it! Basically we will use the template tag {% render_field %}. You will see in the next example that we can simply put the attributes just like we would do with raw HTML:

<form method="post" novalidate>
  {% csrf_token %}

  {{ form.non_field_errors }}

  {% for hidden_field in form.hidden_fields %}
    {{ hidden_field.errors }}
    {{ hidden_field }}
  {% endfor %}

  <table border="1">
    {% for field in form.visible_fields %}
      <tr>
        <th>{{ field.label_tag }}</th>
        <td>
          {{ field.errors }}
          {% render_field field style="border: 2px dashed red;" placeholder=field.name %}
          {{ field.help_text }}
        </td>
      </tr>
    {% endfor %}
  </table>

  <button type="submit">Submit</button>
</form>

Django Widget Tweaks Form

It’s very handy, specially for the cases where you just need to add a CSS class. Which is the case for using the Bootstrap 4 forms templates.


Rendering Bootstrap 4 Forms

Basically to use the Bootstrap 4 library I just included the CDN link they provide in my template:

<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-beta/css/bootstrap.min.css" integrity="sha384-/Y6pD6FV/Vv2HJnA6t+vslU6fwYXjCFtcEpHbNJ0lyAFsXTsjBbfaDjzALeQsN6M" crossorigin="anonymous">
  <title>Simple is Better Than Complex</title>
</head>

This part of the article will be more to-the-point, as I won’t explore the particularities of the Bootstrap 4 implementation. Their documentation is great and rich in examples. If you are not very familiar, you can jump to the Documentation / Components / Forms section for further information.

Let’s first focus on the presentation of the inputs, we will get to the errors part later. Here is how we can represent the same form using the Bootstrap 4 tags:

<form method="post" novalidate>
  {% csrf_token %}

  {% for hidden_field in form.hidden_fields %}
    {{ hidden_field }}
  {% endfor %}

  {% for field in form.visible_fields %}
    <div class="form-group">
      {{ field.label_tag }}
      {{ field }}
      {% if field.help_text %}
        <small class="form-text text-muted">{{ field.help_text }}</small>
      {% endif %}
    </div>
  {% endfor %}

  <button type="submit" class="btn btn-primary">Submit</button>
</form>

Bootstrap 4 Contact Form

The input fields looks broken though. That’s because the Bootstrap 4 forms expect a CSS class form-control in the HTML inputs. Let’s fix it with what we learned in the last section of this article:

{% load widget_tweaks %}

<form method="post" novalidate>
  {% csrf_token %}

  {% for hidden_field in form.hidden_fields %}
    {{ hidden_field }}
  {% endfor %}

  {% for field in form.visible_fields %}
    <div class="form-group">
      {{ field.label_tag }}
      {% render_field field class="form-control" %}
      {% if field.help_text %}
        <small class="form-text text-muted">{{ field.help_text }}</small>
      {% endif %}
    </div>
  {% endfor %}

  <button type="submit" class="btn btn-primary">Submit</button>
</form>

Bootstrap 4 Contact Form

Much better. Now let’s see the validation and errors situation. I’m going to use an alert component for the non field errors, and for the fields I will just play with the right CSS classes that Bootstrap 4 provides.

{% load widget_tweaks %}

<form method="post" novalidate>
  {% csrf_token %}

  {% for hidden_field in form.hidden_fields %}
    {{ hidden_field }}
  {% endfor %}

  {% if form.non_field_errors %}
    <div class="alert alert-danger" role="alert">
      {% for error in form.non_field_errors %}
        {{ error }}
      {% endfor %}
    </div>
  {% endif %}

  {% for field in form.visible_fields %}
    <div class="form-group">
      {{ field.label_tag }}

      {% if form.is_bound %}
        {% if field.errors %}
          {% render_field field class="form-control is-invalid" %}
          {% for error in field.errors %}
            <div class="invalid-feedback">
              {{ error }}
            </div>
          {% endfor %}
        {% else %}
          {% render_field field class="form-control is-valid" %}
        {% endif %}
      {% else %}
        {% render_field field class="form-control" %}
      {% endif %}

      {% if field.help_text %}
        <small class="form-text text-muted">{{ field.help_text }}</small>
      {% endif %}
    </div>
  {% endfor %}

  <button type="submit" class="btn btn-primary">Submit</button>
</form>

And here is the result:

Bootstrap 4 Contact Form

It’s very cool because it marks with green the fields that passed the validation:

Bootstrap 4 Contact Form

Let’s have a close look on what’s going on. We can improve the code snippet but I preferred to keep it that way so you can have a better idea about the template rendering logic.

First, I call the form.is_bound method. It tells us if the form have data or not. When we first initialize the form form = ContactForm(), the form.is_bound() method will return False. After a submission, the form.is_bound() will return True. So, we can play with it to know if the validation process already happened or not.

Then, when the validation already occurred, I’m simply marking the field with the CSS class .is-invalid and .is-valid, depending on the case. They are responsible for painting the form components in red or green.


Reusing Form Components

One thing we can do now, is copy the existing code to an external file, and reuse our code snippet for other forms.

includes/bs4_form.html

{% load widget_tweaks %}

{% for hidden_field in form.hidden_fields %}
  {{ hidden_field }}
{% endfor %}

{% if form.non_field_errors %}
  <div class="alert alert-danger" role="alert">
    {% for error in form.non_field_errors %}
      {{ error }}
    {% endfor %}
  </div>
{% endif %}

{% for field in form.visible_fields %}
  <div class="form-group">
    {{ field.label_tag }}

    {% if form.is_bound %}
      {% if field.errors %}
        {% render_field field class="form-control is-invalid" %}
        {% for error in field.errors %}
          <div class="invalid-feedback">
            {{ error }}
          </div>
        {% endfor %}
      {% else %}
        {% render_field field class="form-control is-valid" %}
      {% endif %}
    {% else %}
      {% render_field field class="form-control" %}
    {% endif %}

    {% if field.help_text %}
      <small class="form-text text-muted">{{ field.help_text }}</small>
    {% endif %}
  </div>
{% endfor %}

Then now, our form definition could be as simple as:

<form method="post" novalidate>
  {% csrf_token %}
  {% include 'includes/bs4_form.html' with form=form %}
  <button type="submit" class="btn btn-primary">Submit</button>
</form>

For example, using the code snippet above, we use it to process the UserCreationForm, which is a built-in form that lives inside the django.contrib.auth module. Below, the result:

Bootstrap 4 Contact Form


Conclusions

This article become bigger than I anticipated. I first thought about writing just a quick tutorial about form rendering. Then I remembered that I already had a to-the-point tutorial explaining how to use the django-widget-tweaks. So, instead I decided to dive deep into the details and explore some of the mechanics of the forms API.

I will have a follow-up article focusing on complex forms, rendering all together checkboxes, select fields, date picker and also about developing your own custom widgets.

I hope you learned something new or enjoying reading this article. If you may have any questions or want to discuss further about the topic, please leave a comment below!

As usual, you can find the source code and all the examples on GitHub.

August 18, 2017 09:00 PM


Codementor

Python Practices for Efficient Code: Performance, Memory, and Usability

Explore best practices to write Python code that executes faster, uses less memory, and looks more appealing.

August 18, 2017 10:51 AM

How I Learned Python Programming Language

Read about one person's perspective on learning to program using Python.

August 18, 2017 08:06 AM

August 17, 2017


Anwesha Das

DreamHost fighting to protect the fundamental rights of its users

Habeas data, my data my right, is the ethos of the right to be a free and fulfilling individual. It offers the individual to be him/herself without being monitored.

In The United States, there are several salvos to protect and further the concept.

The First Amendment

The First Amendment (Amendment I) to the United States Constitution establishes the

The Fourth Amendment

The Fourth Amendment(Amendment IV) to the United States Constitution

The Privacy Protection Act, 1980

The Act protects press, journalists, media house, newsroom from the search conducted by the government office bearers. It mandates that it shall be unlawful for a government employee to search for or seize “work product” or “documentary materials” that are possessed by a person “in connection with a purpose to disseminate to the public a newspaper, book, broadcast, or other similar form of public communication”, in connection with the investigation or prosecution of a criminal offense, [42 U.S.C. §§ 2000aa (a), (b) (1996)]. An order, a subpoena is necessary for accessing the information, documents.

But the Government for the time again have violated, disregarded these mandates and stepped outside their periphery in the name of security of the state.

The present situation with DreamHost

DreamHost is A Los Angeles based company(private). It provides the following services, Web hosting service, Cloud computing service, Cloud storage service, Domain name registrar. The company since past few months is fighting a legal battle to protect their and one of their customer’s, disruptj20.org fundamental right.

What is disruptj20.org?

The company hosts disruptj20.org in the web. It is a website which organized, encouraged willing individuals to participate against the present US Government. Wikipedia says - “DisruptJ20 (also Disrupt J20), a Washington, D.C.-based political organization founded in July 2016 and publicly launched on November 11 of the same year, stated its initial aim as protesting and disrupting events of the presidential inauguration of the 45th U.S.”

The Search Warrant

There was a Search Warrant issued against DreamHost. It requires them to
disclose, give away “the information associated with www.disruptj20.org that is stored at the premises owned, maintained, controlled, or operated by DreamHost,” [ATTACHMENT A].

The particular list of information to be disclosed and information to be seized by the government can be seen at ATTACHMENT B.

How it affects third parties (other than www.disruptj20.org)?

It demands to reveal to the government of “all files” related to the website, which includes the HTTP logs for the visitors, - means

Responding to it the company challenged the Department of Justice (on the warrant). They made an attempt to quash the demand of seizure and disclosure of the information by due legal process and reason.

Motion to show cause

In a usual course of action, the DOJ would respond to the inquiries of DreamHost. But here instead of answering to their inquiries, DOJ chose to file a motion to show cause in the Washington, D.C. Superior Court. DOJ asked for an order to compel them to produce the records,

The Opposition

The Opposition for the denial of the above mentioned motion filed by DreamHost filed an Opposition for the denial of the above mentioned motion. The “Argument” part shows/claims/demonstrates

“This motion is our latest salvo in what has become a months-long battle to protect the identities of thousands of unwitting internet users.”

Electronic Frontier Foundation has led their support, help to DreamHost, though they are not representing them in the court. The matter will be heard on August 18 in Washington, D.C.

There are different kinds of securities. Security for state power is a kind that is constantly protected. In contrast to the security for the population which is constantly denied, negated, curbed and restrained. By looking at the series of events, the documentary records of this particular incident raises a doubt -

The only security in this case is probably is being considered is security to stay in power, noticing the nature, subject of the website. Now it is the high time that people should stand to save individual’s, commoner’s right to have private space, opinion and protest. Kudous to DreamHost to protect the primary fundamental right of and individual - Privacy.

August 17, 2017 07:13 PM


Catalin George Festila

The Google Cloud SDK - part 001 .

This tutorial will cover this steps into development with Google Cloud SDK and Python version 2.7:


First you need to download the Google Cloud SDK and run it.


After GUI install a window command will ask you to set the default project for your work.
Welcome to the Google Cloud SDK! Run "gcloud -h" to get the list of available commands.
---
Welcome! This command will take you through the configuration of gcloud.

Your current configuration has been set to: [default]

You can skip diagnostics next time by using the following flag:
gcloud init --skip-diagnostics

Network diagnostic detects and fixes local network connection issues.
Checking network connection...done.
Reachability Check passed.
Network diagnostic (1/1 checks) passed.

You must log in to continue. Would you like to log in (Y/n)? Y
...
The next step is to start online with deploy a Hello World app with: Deploy a Hello World app:

This will start a online tutorial into the right area of screen with all commands and steps for your Google Cloud SDK online project.
Follow this steps and in the end will see how the online Google Cloud SDK project will show: Hello, World! into your browser .
The next step is to make a local project and run it.
You can use the python docs sample from GoogleCloudPlatform, but is not the same with the online example.
To download the GoogleCloudPlatform sample use git command:
C:\Python27>git clone https://github.com/GoogleCloudPlatform/python-docs-samples
Cloning into 'python-docs-samples'...
remote: Counting objects: 12126, done.
remote: Compressing objects: 100% (16/16), done.
remote: Total 12126 (delta 1), reused 10 (delta 1), pack-reused 12106
Receiving objects: 100% (12126/12126), 3.37 MiB | 359.00 KiB/s, done.
Resolving deltas: 100% (6408/6408), done.

C:\Python27>cd python-docs-samples/appengine/standard/hello_world
To start this sample into your google project you need to use this:
C:\Python27\python-docs-samples\appengine\standard\hello_world>gcloud app deploy app.yaml --project encoded-metrics-147522
Services to deploy:

descriptor: [C:\Python27\python-docs-samples\appengine\standard\hello_world\app.yaml]
source: [C:\Python27\python-docs-samples\appengine\standard\hello_world]
target project: [encoded-metrics-147522]
target service: [default]
target version: [20170817t234925]
target url: [https://encoded-metrics-147522.appspot.com]


Do you want to continue (Y/n)? Y

Beginning deployment of service [default]...
#============================================================#
#= Uploading 5 files to Google Cloud Storage =#
#============================================================#
File upload done.
Updating service [default]...done.
Waiting for operation [apps/encoded-metrics-147522/operations/XXXXXX] to complete...done.
Updating service [default]...done.
Deployed service [default] to [https://XXXXXX.appspot.com]

You can stream logs from the command line by running:
$ gcloud app logs tail -s default

To view your application in the web browser run:
$ gcloud app browse

C:\Python27\python-docs-samples\appengine\standard\hello_world>gcloud app browse
Opening [https://XXXXXX.appspot.com] in a new tab in your default browser.

C:\Python27\python-docs-samples\appengine\standard\hello_world>
This will start your application with trhe text - Hello, World! into your browser address bar with this web address: XXXXXX.appspot.com .



August 17, 2017 02:06 PM


Codementor

How to run a script as a background process?

A simple demonstration on how to run a script as a background process in a Debian environment.

August 17, 2017 08:42 AM


Python Bytes

#39 The new PyPI

<p><strong>Mahmoud #1:</strong> <a href="https://pypi.org/"><strong>The New PyPI</strong></a></p> <ul> <li>Donald Stufft and his PyPA team have been hard at work replacing the old pypi.python.org</li> <li>The new site is now handling almost all the old functionality (excepting deprecated features, of course): <a href="https://pypi.org/">https://pypi.org/</a></li> <li>The new site has handled downloads (presently exceeding 1PB monthly bandwidth) for a while now, and uploads as of recently.</li> <li>A nice full-fledged, open-source Python application, eagerly awaiting your review and contribution: <a href="https://github.com/pypa/warehouse/">https://github.com/pypa/warehouse/</a></li> <li>More updates at: <a href="https://mail.python.org/pipermail/distutils-sig/">https://mail.python.org/pipermail/distutils-sig/</a></li> </ul> <p><strong>Brian #2:</strong> <a href="http://makezine.com/2017/08/11/circuitpython-snakes-way-adafruit-hardware/"><strong>CircuitPython Snakes its Way onto Adafruit Hardware</strong></a></p> <ul> <li><a href="https://blog.adafruit.com/2017/01/09/welcome-to-the-adafruit-circuitpython-beta/">Adafruit announced CircuitPython in January</a> <ul> <li>“CircuitPython is based on the <a href="https://github.com/micropython/micropython">open-source</a> <a href="https://micropython.org/">MicroPython</a> which brings the popular Python language to microcontrollers. The goal of CircuitPython is to make hardware as simple and easy as possible.”</li> <li>Already runs on <a href="https://www.adafruit.com/product/3505">Metro M0 Express</a>, <a href="https://www.adafruit.com/product/3403">Feather M0 Express</a>, and they are working on support for <a href="https://www.adafruit.com/product/3333">Circuit Playground Express</a>, and now Gemma M0</li> </ul></li> <li>New product is <a href="https://www.adafruit.com/product/3501">Gemma M0</a>: <ul> <li><a href="https://blog.adafruit.com/2017/07/27/new-product-adafruit-gemma-m0-miniature-wearable-electronic-platform/">Announced</a> at the end of July.</li> <li>It’s about the size of a quarter and is considered a wearable computer.</li> <li>“When you plug it in, it will show up as a very small disk drive with <strong>main.py</strong> on it. Edit <strong>main.py</strong> with your favorite text editor to build your project using Python, the most popular programming language. No installs, IDE or compiler needed, so you can use it on any computer, even ChromeBooks or computers you can’t install software on. When you’re done, unplug the Gemma M0 and your code will go with you."</li> <li>They’re under $10. I gotta get one of these and play with it. (Anyone from Adafruit listening, want to send me one?)</li> <li>Here's the intro video for it: <a href="https://www.youtube.com/watch?v=nRE_cryQJ5c&amp;feature=youtu.be">https://www.youtube.com/watch?v=nRE_cryQJ5c&amp;feature=youtu.be</a></li> </ul></li> <li><a href="https://learn.adafruit.com/creating-and-sharing-a-circuitpython-library">Creating and sharing a CircuitPython Library</a> is a good introduction to the Python open source community, including: <ul> <li>Creating a library (package or module)</li> <li>Sharing on GitHub</li> <li>Sharing docs on ReadTheDocs</li> <li>Testing with Travis CI</li> <li>Releasing on GitHub</li> </ul></li> </ul> <p><strong>Mahmoud #3:</strong> <strong>Dataclasses</strong></p> <ul> <li>Python has had classes for a long time, but maybe it’s time for some updated syntax and semantics, something higher level perhaps?</li> <li>dataclasses is an interesting case of Python’s core dev doing their own take on community innovation (Hynek’s attrs: https://attrs.org)</li> <li>Code, issues, and draft PEP at https://github.com/ericvsmith/dataclasses</li> </ul> <p><strong>Brian #4:</strong> <a href="http://kanoki.org/2017/07/16/pandas-in-a-nutshell/"><strong>Pandas in a Nutshell</strong></a></p> <ul> <li>Jupyter Notebook style post. Tutorial by example with just a bit of extra text for explanation.</li> <li>Data structures: <ul> <li>Series – it’s a one dimensional array with indexes, it stores a single column or row of data in a Dataframe</li> <li>Dataframe – it’s a tabular spreadsheet like structure representing rows each of which contains one or multiple columns</li> </ul></li> <li>Series: Custom indices, adding two series, naming series, …</li> <li>Dataframes: using .head() and .tail(), info(), adding columns, adding a column as a calculation of another column, deleting a column, creating a dataframe from a dictionary, reindexing, summing columns and rows, .describe() for simple statistics, corr() for correlations, dealing with missing values, dropping rows, selecting, sorting, multi-indexing, grouping, </li> </ul> <p><strong>Mahmoud</strong> <strong>#5:</strong> <strong>Static Typing</strong></p> <ul> <li>PyBay 2017, which ended a day before recording, featured a neat panel on static typing in Python.</li> <li>One member each from Google, Quora, PyCharm, Facebook, and University of California</li> <li>Three different static analysis tools (four, if you count PyLint)</li> <li>They’re all collaborating already, and open to much more, as we can see on this collection of the stdlib’s type defs: <a href="https://github.com/python/typeshed">https://github.com/python/typeshed</a></li> <li>A fair degree of consensus around static types being most useful for testable documentation, like doctests, but with more systemic implications</li> <li>Not intended to be an algebraic type system (like Haskell, etc.)</li> </ul> <p><strong>Brian</strong> <strong>#6:</strong> <a href="https://www.fullstackpython.com/object-relational-mappers-orms.html"><strong>Full Stack Python Explains ORMs</strong></a></p> <ul> <li>What are Object Relational Mappers? <ul> <li>“An object-relational mapper (ORM) is a code library that automates the transfer of data stored in relational databases tables into objects that are more commonly used in application code.”</li> </ul></li> <li>Why are they useful? <ul> <li>“ORMs provide a high-level abstraction upon a relational database that allows a developer to write Python code instead of SQL to create, read, update and delete data and schemas in their database.”</li> </ul></li> <li>Do you need to use them?</li> <li>Downsides to ORMs: <ul> <li>Impedance mismatch : “the way a developer uses objects is different from how data is stored and joined in relational tables”</li> <li>Potential for reduced performance: code in the middle isn’t free</li> <li>Shifting complexity from the database into the application code : people usually don’t use database stored procedures when working with ORMs.</li> </ul></li> <li>A handful of popular ones including Django ORM, SQLAlchemy, Peewee, Pony, and SQLObject. Mostly listed as pointing out that they are active projects, brief description, and links for more info.</li> <li>Matt also has a <a href="https://www.fullstackpython.com/sqlalchemy.html">SQLAlchemy page</a> and a <a href="https://www.fullstackpython.com/peewee.html">peewee page</a> for more info on them.</li> </ul> <p><strong>Extra Mahmoud:</strong></p> <ul> <li><a href="https://github.com/python-hyper/hyperlink">hyperlink</a></li> <li><a href="https://riot.im">riot.im</a> + <a href="https://riot.im"></a><a href="https://github.com/matrix-org/synapse">(server code in Python)</a></li> </ul> <p><strong>Extra Brian:</strong></p> <ul> <li><a href="https://pragprog.com/book/bopytest/python-testing-with-pytest">Python Testing with pytest</a> has a <a href="https://forums.pragprog.com/forums/438">Discussion Forum</a>. It’s something that I think all Pragmatic books have. Just this morning I answered a question about the difference between monkeypatch and mock and when you would use one over the other.</li> </ul>

August 17, 2017 08:00 AM


Duncan McGreggor

NASA/EOSDIS Earthdata

Update

It's been a few years since I posted on this blog -- most of the technical content I've been contributing to in the past couple years has been in the following:
But since the publication of the Mastering matplotlib book, I've gotten more and more into satellite data. The book, it goes without saying, focused on Python for the analysis and interpretation of satellite data (in one of the many topics covered). After that I spent some time working with satellite and GIS data in general using Erlang and LFE. Ultimately though, I found that more and more projects were using the JVM for this sort of work, and in particular, I noted that Clojure had begun to show up in a surprising number of Github projects.

EOSDIS

Enter NASA's Earth Observing System Data and Information System (see also earthdata.nasa.gov and EOSDIS on Wikipedia), a key part of the agency's Earth Science Data Systems Program. It's essentially a concerted effort to bring together the mind-blowing amounts of earth-related data being collected throughout, around, and above the world so that scientists may easily access and correlate earth science data for their research.

Related NASA projects include the following:
The acronym menagerie can be bewildering, but digging into the various NASA projects is ultimately quite rewarding (greater insights, previously unknown resources, amazing research, etc.).

Clojure

Back to the Clojure reference I made above:  I've been contributing to the nasa/Common-Metadata-Repository open source project (hosted on Github) for a few months now, and it's been amazing to see how all this data from so many different sources gets added, indexed, updated, and generally made so much more available to any who want to work with it. The private sector always seems to be so far ahead of large projects in terms of tech and continuously improving updates to existing software, so its been pretty cool to see a large open source project in the NASA Github org make so many changes that find ways to keep helping their users do better research. More so that users are regularly delivered new features in a large, complex collection of libraries and services thanks in part to the benefits that come from using a functional programming language.

It may seem like nothing to you, but the fact that there are now directory pages for various data providers (e.g., GES_DISC, i.e., Goddard Earth Sciences Data and Information Services Center) makes a big difference for users of this data. The data provider pages now also offer easy access to collection links such as UARS Solar Ultraviolet Spectral Irradiance Monitor. Admittedly, the directory pages still take a while to load, but there are improvements on the way for page load times and other related tasks. If you're reading this a month after this post was written, there's a good chance it's already been fixed by now.

Summary

In summary, it's been a fun personal journey from looking at Landsat data for writing a book to working with open source projects that really help scientists to do their jobs better :-) And while I have enjoyed using the other programming languages to explore this problem space, Clojure in particular has been a delightfully powerful tool for delivering new features to the science community.

August 17, 2017 07:05 AM

August 16, 2017


Continuum Analytics News

Continuum Analytics to Share Insights at JupyterCon 2017

Thursday, August 17, 2017

Presentation topics include Jupyter and Anaconda in the enterprise; open innovation in a data-centric world; building an Excel-Python bridge; encapsulating data science using Anaconda Project and JupyterLab; deploying Jupyter dashboards for datapoints; JupyterLab

NEW YORK, August 17, 2017—Continuum Analytics, the creator and driving force behind Anaconda, the leading Python data science platform, today announced that the team will present one keynote, three talks and two tutorials at JupyterCon on August 23 and 24 in NYC, NY. The event is designed for the data science and business analyst community and offers in-depth trainings, insightful keynotes, networking events and talks exploring the Project Jupyter platform.

Peter Wang, co-founder and CTO of Continuum Analytics, will present two sessions on August 24. The first is a keynote at 9:15 am, titled “Jupyter & Anaconda: Shaking Up the Enterprise.” Peter will discuss the co-evolution of these two major players in the new open source data science ecosystem and next steps to a sustainable future. The other is a talk, “Fueling Open Innovation in a Data-Centric World,” at 11:55 am, offering Peter’s perspectives on the unique challenges of building a company that is fundamentally centered around sustainable open source innovation.

The second talk features Christine Doig, senior data scientist, product manager, and Fabio Pliger, software engineer, of Continuum Analytics, “Leveraging Jupyter to build an Excel-Python Bridge.” It will take place on August 24 at 11:05 am and Christine and Fabio will share how they created a native Microsoft Excel plug-in that provides a point-and-click interface to Python functions, enabling Excel analysts to use machine learning models, advanced interactive visualizations and distributed compute frameworks without needing to write any code. Christine will also be holding a talk on August 25 at 11:55 am on “Data Science Encapsulation and Deployment with Anaconda Project & JupyterLab.” Christine will share how Anaconda Project and JupyterLab encapsulate data science and how to deploy self-service notebooks, interactive applications, dashboards and machine learning.

James Bednar, senior solutions architect, and Philipp Rudiger, software developer, of Continuum Analytics, will give a tutorial on August 23 at 1:30 pm titled, “Deploying Interactive Jupyter Dashboards for Visualizing Hundreds of Millions of Datapoints.” This tutorial will explore an overall workflow for building interactive dashboards, visualizing billions of data points interactively in a Jupyter notebook, with graphical widgets allowing control over data selection, filtering and display options, all using only a few dozen lines of code.

The second tutorial, “JupyterLab,” will be hosted by Steven Silvester, software engineer at Continuum Analytics and Jason Grout, software developer at Bloomberg, on August 23 at 1:30 pm. They will walk through JupyterLab as a user and as an extension author, exploring its capabilities and offering a demonstration on how to create a simple extension to the environment.

Keynote:
WHO: Peter Wang, co-founder and CTO, Anaconda Powered by Continuum Analytics
WHAT: Jupyter & Anaconda: Shaking Up the Enterprise
WHEN: August 24, 9:15am-9:25am ET
WHERE: Grand Ballroom

Talk #1:
WHO: Peter Wang, co-founder and CTO, Anaconda Powered by Continuum Analytics
WHAT: Fueling Open Innovation in a Data-Centric World
WHEN: August 24, 11:55am–12:35pm ET
WHERE: Regent Parlor

Talk #2:
WHO: 

  • Christine Doig, senior data scientist, product manager, Anaconda Powered by Continuum Analytics
  • Fabio Pliger, software engineer, Anaconda Powered by Continuum Analytics

WHAT: Leveraging Jupyter to Build an Excel-Python Bridge
WHEN: August 24, 11:05am–11:45am ET
WHERE: Murray Hill

Talk #3:
WHO: Christine Doig, senior data scientist, product manager, Anaconda Powered by Continuum Analytics
WHAT: Data Science Encapsulation and Deployment with Anaconda Project & JupyterLab
WHEN: August 25, 11:55am–12:35pm ET
WHERE: Regent Parlor

Tutorial #1:
WHO: 

  • James Bednar, senior solutions architect, Anaconda Powered By Continuum Analytics 
  • Philipp Rudiger, software developer, Anaconda Powered By Continuum Analytics 

WHAT: Deploying Interactive Jupyter Dashboards for Visualizing Hundreds of Millions of Datapoints
WHEN: August 23, 1:30pm–5:00pm ET
WHERE: Concourse E

Tutorial #2:
WHO: 

  • Steven Silvester, software engineer, Anaconda Powered By Continuum Analytics 
  • Jason Grout, software developer of Bloomberg

WHAT: JupyterLab Tutorial
WHEN: August 23, 1:30pm–5:00pm ET
WHERE: Concourse A

###

About Anaconda Powered by Continuum Analytics
Anaconda is the leading data science platform powered by Python, the fastest growing data science language with more than 30 million downloads to date. Continuum Analytics is the creator and driving force behind Anaconda, empowering leading businesses across industries worldwide with solutions to identify patterns in data, uncover key insights and transform data into a goldmine of intelligence to solve the world’s most challenging problems. Anaconda puts superpowers into the hands of people who are changing the world. Learn more at continuum.io

###

Media Contact:
Jill Rosenthal
InkHouse
anaconda@inkhouse.com

 

August 16, 2017 03:12 PM


Eli Bendersky

Right and left folds, primitive recursion patterns in Python and Haskell

A "fold" is a fundamental primitive in defining operations on data structures; it's particularly important in functional languages where recursion is the default tool to express repetition. In this article I'll present how left and right folds work and how they map to some fundamental recursive patterns.

The article starts with Python, which should be (or at least look) familiar to most programmers. It then switches to Haskell for a discussion of more advanced topics like the connection between folding and laziness, as well as monoids.

Extracting a fundamental recursive pattern

Let's begin by defining a couple of straightforward functions in a recursive manner, in Python. First, computing the product of all the numbers in a given list:

def product(seq):
    if not seq:
        return 1
    else:
        return seq[0] * product(seq[1:])

Needless to say, we wouldn't really write this function recursively in Python; but if we were, this is probably how we'd write it.

Now another, slightly different, function. How do we double (multiply by 2) every element in a list, recursively?

def double(seq):
    if not seq:
        return []
    else:
        return [seq[0] * 2] + double(seq[1:])

Again, ignoring the fact that Python has much better ways to do this (list comprehensions, for example), this is a straightforward recursive pattern that experienced programmers can produce in their sleep.

In fact, there's a lot in common between these two implementation. Let's try to find the commonalities.

Recursion pattern in the product function

As this diagram shows, the functions product and double are really only different in three places:

  1. The initial value produced when the input sequence is empty.
  2. The mapping applied to every sequence value processed.
  3. The combination of the mapped sequence value with the rest of the sequence.

For product:

  1. The initial value is 1.
  2. The mapping is identity (each sequence element just keeps its value, without change).
  3. The combination is the multiplication operator.

Can you figure out the same classification for double? Take a few moments to try for yourself. Here it is:

  1. The initial value is the empty list [].
  2. The mapping takes a value, multiplies it by 2 and puts it into a list. We could express this in Python as lambda x: [x * 2].
  3. The combination is the list concatenation operator +.

With the diagram above and these examples, it's straightforward to write a generalized "recursive transform" function that can be used to implement both product and double:

def transform(init, mapping, combination, seq):
    if not seq:
        return init
    else:
        return combination(mapping(seq[0]),
                           transform(init, mapping, combination, seq[1:]))

The transform function is parameterized with init - the initial value, mapping- a mapping function applied to every sequence value, and combination - the combination of the mapped sequence value with the rest of the sequence. With these given, it implements the actual recursive traversal of the list.

Here's how we'd write product in terms of transform:

def product_with_transform(seq):
    return transform(1, lambda x: x, lambda a, b: a * b, seq)

And double:

def double_with_transform(seq):
    return transform([], lambda x: [x * 2], lambda a, b: a + b, seq)

foldr - fold right

Generalizations like transform make functional programming fun and powerful, since they let us express complex ideas with the help of relatively few building blocks. Let's take this idea further, by generalizing transform even more. The main insight guiding us is that the mapping and the combination don't even have to be separate functions. A single function can play both roles.

In the definition of transform, combination is applied to:

  1. The result of calling mapping on the current sequence value.
  2. The recursive application of the transformation to the rest of the sequence.

We can encapsulate both in a function we call the "reduction function". This reduction function takes two arguments: the current sequence value (item), and the result of applying the full transfromation to the rest of the sequence. The driving transformation that uses this reduction function is called "a right fold" (or foldr):

def foldr(func, init, seq):
    if not seq:
        return init
    else:
        return func(seq[0], foldr(func, init, seq[1:]))

We'll get to why this is called "fold" shortly; first, let's convince ourselves it really works. Here's product implemented using foldr:

def product_with_foldr(seq):
    return foldr(lambda seqval, acc: seqval * acc, 1, seq)

The key here is the func argument. In the case of product, it "reduces" the current sequence value with the "accumulator" (the result of the overall transformation invoked on the rest of the sequence) by multiplying them together. The overall result is a product of all the elements in the list.

Let's trace the calls to see the recursion pattern. I'll be using the tracing technique described in this post. For this purpose I hoisted the reducing function into a standalone function called product_reducer:

def product_reducer(seqval, acc):
    return seqval * acc

def product_with_foldr(seq):
    return foldr(product_reducer, 1, seq)

The full code for this experiment is available here. Here's the tracing of invoking product_with_foldr([2, 4, 6, 8]):

product_with_foldr([2, 4, 6, 8])
  foldr(<function product_reducer at 0x7f3415145ae8>, 1, [2, 4, 6, 8])
    foldr(<function product_reducer at 0x7f3415145ae8>, 1, [4, 6, 8])
      foldr(<function product_reducer at 0x7f3415145ae8>, 1, [6, 8])
        foldr(<function product_reducer at 0x7f3415145ae8>, 1, [8])
          foldr(<function product_reducer at 0x7f3415145ae8>, 1, [])
          --> 1
          product_reducer(8, 1)
          --> 8
        --> 8
        product_reducer(6, 8)
        --> 48
      --> 48
      product_reducer(4, 48)
      --> 192
    --> 192
    product_reducer(2, 192)
    --> 384
  --> 384

The recursion first builds a full stack of calls for every element in the sequence, until the base case (empty list) is reached. Then the calls to product_reducer start executing. The first reduces 8 (the last element in the list) with 1 (the result of the base case). The second reduces this result with 6 (the second-to-last element in the list), and so on until we reach the final result.

Since foldr is just a generic traversal pattern, we can say that the real work here happens in the reducers. If we build a tree of invocations of product_reducer, we get:

foldr mul tree

And this is why it's called the right fold. It takes the rightmost element and combines it with init. Then it takes the result and combines it with the second rightmost element, and so on until the first element is reached.

More general operations with foldr

We've seen how foldr can implement all kinds of functions on lists by encapsulating a fundamental recursive pattern. Let's see a couple more examples. The function double shown above is just a special case of the functional map primitive:

def map(mapf, seq):
    if not seq:
        return []
    else:
        return [mapf(seq[0])] + map(mapf, seq[1:])

Instead of applying a hardcoded "multiply by 2" function to each element in the sequence, map applies a user-provided unary function. Here's map implemented in terms of foldr:

def map_with_foldr(mapf, seq):
    return foldr(lambda seqval, acc: [mapf(seqval)] + acc, [], seq)

Another functional primitive that we can implement with foldr is filter. This one is just a bit trickier because we sometimes want to "skip" a value based on what the filtering predicate returns:

def filter(predicate, seq):
    if not seq:
        return []
    else:
        maybeitem = [seq[0]] if predicate(seq[0]) else []
        return maybeitem + filter(predicate, seq[1:])

Feel free to try to rewrite it with foldr as an exercise before looking at the code below. We just follow the same pattern:

def filter_with_foldr(predicate, seq):
    def reducer(seqval, acc):
        if predicate(seqval):
            return [seqval] + acc
        else:
            return acc
    return foldr(reducer, [], seq)

We can also represent less "linear" operations with foldr. For example, here's a function to reverse a sequence:

def reverse_with_foldr(seq):
    return foldr(lambda seqval, acc: acc + [seqval], [], seq)

Note how similar it is to map_with_foldr; only the order of concatenation is flipped.

Left-associative operations and foldl

Let's probe at some of the apparent limitations of foldr. We've seen how it can be used to easily compute the product of numbers in a sequence. What about a ratio? For the list [3, 2, 2] the ratio is "3 divided by 2, divided by 2", or 0.75 [1].

If we take product_with_foldr from above and replace * by /, we get:

>>> foldr(lambda seqval, acc: seqval / acc, 1, [3, 2, 2])
3.0

What gives? The problem here is the associativity of the operator /. Take another look at the call tree diagram shown above. It's obvious this diagram represents a right-associative evaluation. In other words, what our attempt at a ratio did is compute 3 / (2 / 2), which is indeed 3.0; instead, what we'd like is (3 / 2) / 2. But foldr is fundamentally folding the expression from the right. This works well for associative operations like + or * (operations that don't care about the order in which they are applied to a sequence), and also for right-associative operations like exponentiation, but it doesn't work that well for left-associative operations like / or -.

This is where the left fold comes in. It does precisely what you'd expect - folds a sequence from the left, rather than from the right. I'm going to leave the division operation for later [2] and use another example of a left-associative operation: converting a sequence of digits into a number. For example [2, 3] represents 23, [3, 4, 5, 6] represents 3456, etc. (a related problem which is more common in introductory programming is converting a string that contains a number into an integer).

The basic reducing operation we'll use here is: acc * 10 + sequence value. To get 3456 from [3, 4, 5, 6] we'll compute:

(((((3 * 10) + 4) * 10) + 5) * 10) + 6

Note how this operation is left-associative. Reorganizing the parens to a rightmost-first evaluation would give us a completely different result.

Without further ado, here's the left fold:

def foldl(func, init, seq):
    if not seq:
        return init
    else:
        return foldl(func, func(init, seq[0]), seq[1:])

Note that the order of calls between the recursive call to itself and the call to func is reversed vs. foldr. This is also why it's customary to put acc first and seqval second in the reducing functions passed to foldl.

If we perform multiplication with foldl:

def product_with_foldl(seq):
    return foldl(lambda acc, seqval: acc * seqval, 1, seq)

We'll get this trace:

product_with_foldl([2, 4, 6, 8])
  foldl(<function product_reducer at 0x7f2924cbdc80>, 1, [2, 4, 6, 8])
    product_reducer(1, 2)
    --> 2
    foldl(<function product_reducer at 0x7f2924cbdc80>, 2, [4, 6, 8])
      product_reducer(2, 4)
      --> 8
      foldl(<function product_reducer at 0x7f2924cbdc80>, 8, [6, 8])
        product_reducer(8, 6)
        --> 48
        foldl(<function product_reducer at 0x7f2924cbdc80>, 48, [8])
          product_reducer(48, 8)
          --> 384
          foldl(<function product_reducer at 0x7f2924cbdc80>, 384, [])
          --> 384
        --> 384
      --> 384
    --> 384
  --> 384

Contrary to the right fold, the reduction function here is called immediately for each recursive step, rather than waiting for the recursion to reach the end of the sequence first. Let's draw the call graph to make the folding-from-the-left obvious:

foldl mul tree

Now, to implement the digits-to-a-number function task described earlier, we'll write:

def digits2num_with_foldl(seq):
    return foldl(lambda acc, seqval: acc * 10 + seqval, 0, seq)

Stepping it up a notch - function composition with foldr

Since we're looking at functional programming primitives, it's only natural to consider how to put higher order functions to more use in combination with folds. Let's see how to express function composition; the input is a sequence of unary functions: [f, g, h] and the output is a single function that implements f(g(h(...))). Note this operation is right-associative, so it's a natural candidate for foldr:

identity = lambda x: x

def fcompose_with_foldr(fseq):
    return foldr(lambda seqval, acc: lambda x: seqval(acc(x)), identity, fseq)

In this case seqval and acc are both functions. Each step in the fold consumes a new function from the sequence and composes it on top of the accumulator (which is the function composed so far). The initial value for this fold has to be the identity for the composition operation, which just happens to be the identity function.

>>> f = fcompose_with_foldr([lambda x: x+1, lambda x: x*7, lambda x: -x])
>>> f(8)
-55

Let's take this trick one step farther. Recall how I said foldr is limited to right-associative operations? Well, I lied a little. While it's true that the fundamental recursive pattern expressed by foldr is right-associative, we can use the function composition trick to evaluate some operation on a sequence in a left-associative way. Here's the digits-to-a-number function with foldr:

def digits2num_with_foldr(seq):
    composed = foldr(
                lambda seqval, acc: lambda n: acc(n * 10 + seqval),
                identity,
                seq)
    return composed(0)

To understand what's going on, manually trace the invocation of this function on some simple sequence like [1, 2, 3]. The key to this approach is to recall that foldr gets to the end of the list before it actually starts applying the function it folds. The following is a careful trace of what happens, with the folded function replaced by g for clarify.

digits2num_with_foldl([1, 2, 3])
-> foldr(g, identity, [1, 2, 3])
-> g(1, foldr(g, identity, [2, 3]))
-> g(1, g(2, foldr(g, identity, [3])))
-> g(1, g(2, g(3, foldr(g, identity, []))))
-> g(1, g(2, g(3, identity)))
-> g(1, g(2, lambda n: identity(n * 10 + 3)))

Now things become a bit trickier to track because of the different anonymous functions and their bound variables. It helps to give these function names.

<f1 = lambda n: identity(n * 10 + 3)>
-> g(1, g(2, f1))
-> g(1, lambda n: f1(n * 10 + 2))
<f2 = lambda n: f1(n * 10 + 2)>
-> g(1, f2)
-> lambda n: f2(n * 10 + 1)

Finally, we invoke this returned function on 0:

f2(0 * 10 + 1)
-> f1(1 * 10 + 2)
-> identity(12 * 10 + 3)
-> 123

In other words, the actual computation passed to that final identity is:

((1 * 10) + 2) * 10 + 3

Which is the left-associative application of the folded function.

Expressing foldl with foldr

After the last example, it's not very surprising that we can take this approach to its logical conclusion and express the general foldl by using foldr. It's just a generalization of digits2num_with_foldr:

def foldl_with_foldr(func, init, seq):
    composed = foldr(
                lambda seqval, acc: lambda n: acc(func(n, seqval)),
                identity,
                seq)
    return composed(init)

In fact, the pattern expressed by foldr is very close to what is called primitive recursion by Stephen Kleene in his 1952 book Introduction to Metamathematics. In other words, foldr can be used to express a wide range of recursive patterns. I won't get into the theory here, but Graham Hutton's article A tutorial on the universality and expressiveness of fold is a good read.

foldr and foldl in Haskell

Now I'll switch gears a bit and talk about Haskell. Writing transformations with folds is not really Pythonic, but it's very much the default Haskell style. In Haskell recursion is the way to iterate.

Haskell is a lazily evaluated language, which makes the discussion of folds a bit more interesting. While this behavior isn't hard to emulate in Python, the Haskell code dealing with folds on lazy sequences is pleasantly concise and clear.

Let's starts by implementing product and double - the functions this article started with. Here's the function computing a product of a sequence of numbers:

myproduct [] = 1
myproduct (x:xs) = x * myproduct xs

And a sample invocation:

*Main> myproduct [2,4,6,8]
384

The function doubling every element in a sequence:

mydouble [] = []
mydouble (x:xs) = [2 * x] ++ mydouble xs

Sample invocation:

*Main> mydouble [2,4,6,8]
[4,8,12,16]

IMHO, the Haskell variants of these functions make it very obvious that a right-fold recursive pattern is in play. The pattern matching idiom of (x:xs) on sequences splits the "head" from the "tail" of the sequence, and the combining function is applied between the head and the result of the transformation on the tail. Here's foldr in Haskell, with a type declaration that should help clarify what goes where:

myfoldr :: (b -> a -> a) -> a -> [b] -> a
myfoldr _ z [] = z
myfoldr f z (x:xs) = f x (myfoldr f z xs)

If you're not familiar with Haskell this code may look foreign, but it's really a one-to-one mapping of the Python code for foldr, using some Haskell idioms like pattern matching.

These are the product and doubling functions implemented with myfoldr, using currying to avoid specifying the last parameter:

myproductWithFoldr = myfoldr (*) 1

mydoubleWithFoldr = myfoldr (\x acc -> [2 * x] ++ acc) []

Haskell also has a built-in foldl which performs the left fold. Here's how we could write our own:

myfoldl :: (a -> b -> a) -> a -> [b] -> a
myfoldl _ z [] = z
myfoldl f z (x:xs) = myfoldl f (f z x) xs

And this is how we'd write the left-associative function to convert a sequence of digits into a number using this left fold:

digitsToNumWithFoldl = myfoldl (\acc x -> acc * 10 + x) 0

Folds, laziness and infinite lists

Haskell evaluates all expressions lazily by default, which can be either a blessing or a curse for folds, depending on what we need to do exactly. Let's start by looking at the cool applications of laziness with foldr.

Given infinite lists (yes, Haskell easily supports infinite lists because of laziness), it's fairly easy to run short-circuiting algorithms on them with foldr. By short-circuiting I mean an algorithm that terminates the recursion at some point throughout the list, based on a condition.

As a silly but educational example, consider doubling every element in a sequence but only until a 5 is encountered, at which point we stop:

> foldr (\x acc -> if x == 5 then [] else [2 * x] ++ acc) [] [1,2,3,4,5,6,7]
[2,4,6,8]

Now let's try the same on an infinite list:

> foldr (\x acc -> if x == 5 then [] else [2 * x] ++ acc) [] [1..]
[2,4,6,8]

It terminates and returns the right answer! Even though our earlier stack trace of folding makes it appear like we iterate all the way to the end of the input list, this is not the case for our folding function. Since the folding function doesn't use acc when x == 5, Haskell won't evaluate the recursive fold further [3].

The same trick will not work with foldl, since foldl is not lazy in its second argument. Because of this, Haskell programmers are usually pointed to foldl', the eager version of foldl, as the better option. foldl' evaluates its arguments eagerly, meaning that:

  1. It won't support infinite sequences (but neither does foldl!)
  2. It's significantly more efficient than foldl because it can be easily turned into a loop (note that the recursion in foldl is a tail call, and the eager foldl' doesn't have to build a thunk of increasing size due to laziness in the first argument).

There is also an eager version of the right fold - foldr', which can be more efficient than foldr in some cases; it's not in Prelude but can be imported from Data.Foldable [4].

Folding vs. reducing

Our earlier discussion of folds may have reminded you of the reduce built-in function, which seems to be doing something similar. In fact, Python's reduce implements the left fold where the first element in the sequence is used as the zero value. One nice property of reduce is that it doesn't require an explicit zero value (though it does support it via an optional parameter - this can be useful when the sequence is empty, for example).

Haskell has its own variations of folds that implement reduce - they have the digit 1 as suffix: foldl1 is the more direct equivalent of Python's reduce - it doesn't need an initializer and folds the sequence from the left. foldr1 is similar, but folds from the right. Both have eager variants: foldl1' and foldr1'.

I promised to revisit calculating the ratio of a sequence; here's a way, in Haskell:

myratioWithFoldl = foldl1 (/)

The problem with using a regular foldl is that there's no natural identity value to use on the leftmost side of a ratio (on the rightmost side 1 works, but the associativity is wrong). This is not an issue for foldl1, which starts the recursion with the first item in the sequence, rather than an explicit initial value.

*Main> myratioWithFoldl [3,2,2]
0.75

Note that foldl1 will throw an exception if the given sequence is empty, since it needs at least one item in there.

Folding arbitrary data structures

The built-in folds in Haskell are defined on lists. However, lists are not the only data structure we should be able to fold. Why can't we fold maps (say, summing up all the keys), or even custom data structures? What is the minimum amount of abstraction we can extract to enable folding?

Let's start by defining a simple binary tree data structure:

data Tree a = Empty | Leaf a | Node a (Tree a) (Tree a)
  deriving Show

-- A sample tree with a few nodes
t1 = Node 10 (Node 20 (Leaf 4) (Leaf 6)) (Leaf 7)

Suppose we want to fold the tree with (+), summing up all the values contained within it. How do we go about it? foldr or foldl won't cut it here - they expect [a], not Tree a. We could try to write our own foldr:

foldTree :: (b -> a -> a) -> a -> Tree b -> a
foldTree _ z Empty = z
foldTree f z (Leaf x) = ??
foldTree f (Node x left right) = ??

There's a problem, however. Since we want to support an arbitrary folding result, we're not quite sure what to substitute for the ??s in the code above. In foldr, the folding function takes the accumulator and the next value in the sequence, but for trees it's not so simple. We may encounter a single leaf, and we may encounter several values to summarize; for the latter we have to invoke f on x as well as on the result of folding left and right. So it's not clear what the type of f should be - (b -> a -> a) doesn't appear to work [5].

A useful Haskell abstraction that can help us solve this problem is Monoid. A monoid is any data type that has an identity element (called mempty) and an associative binary operation called mappend. Monoids are, therefore, amenable to "summarization".

foldTree :: Monoid a => (b -> a) -> Tree b -> a
foldTree _ Empty = mempty
foldTree f (Leaf x) = f x
foldTree f (Node x left right) = (foldTree f left) <> f x <> (foldTree f right)

We no longer need to pass in an explicit zero element: since a is a Monoid, we have its mempty. Also, we can now apply a single (b -> a) function onto every element in the tree, and combine the results together into a summary using a's mappend (<> is the infix synonym of mappend).

The challenge of using foldTree is that we now actually need to use a unary function that returns a Monoid. Luckily, Haskell has some useful built-in monoids. For example, Data.Monoid.Sum wraps numbers into monoids under addition. We can find the sum of all elements in our tree t1 using foldTree and Sum:

> foldrTree Sum t1
Sum {getSum = 47}

Similarly, Data.Monoid.Product wraps numbers into monoids under multiplication:

> foldrTree Product t1
Product {getProduct = 33600}

Haskell provides a built-in typeclass named Data.Foldable that only requires us to implement a similar mapping function, and then takes care of defining many folding methods. Here's the instance for our tree:

instance Foldable Tree where
  foldMap f Empty = mempty
  foldMap f (Leaf x) = f x
  foldMap f (Node x left right) = foldMap f left <> f x <> foldMap f right

And we'll automatically have foldr, foldl and other folding methods available on Tree objects:

> Data.Foldable.foldr (+) 0 t1
47

Note that we can pass a regular binary (+) here; Data.Foldable employs a bit of magic to turn this into a properly monadic operation. We get many more useful methods on trees just from implementing foldMap:

> Data.Foldable.toList t1
[4,20,6,10,7]
> Data.Foldable.elem 6 t1
True

It's possible that for some special data structure these methods can be implemented more efficiently than by inference from foldMap, but nothing is stopping us from redefining specific methods in our Foldable instance. It's pretty cool, however, to see just how much functionality can be derived from having a single mapping method (and the Monoid guarantees) defined. See the documentation of Data.Foldable for more details.


[1]Note that I'm using Python 3 for all the code in this article; hence, Python 3's division semantics apply.
[2]Division has a problem with not having a natural "zero" element; therefore, it's more suitable for foldl1 and reduce, which are described later on.
[3]I'm prefixing most functions here with my since they have Haskell standard library builtin equivalents; while it's possible to avoid the name clashes with some import tricks, custom names are the least-effort approach, also for copy-pasting these code snippets into a REPL.
[4]I realize this is a very rudimentary explanation of Haskell laziness, but going deeper is really out of scope of this article. There are plenty of resources online to read about lazy vs. eager evaluation, if you're interested.
[5]We could try to apply f between the leaf value and z, but it's not clear in what order this should be done (what if f is sensitive to order?). Similarly for a Node, since there are no guarantees on the associativity of f, it's hard to predict what is the right way of applying it multiple times.

August 16, 2017 12:48 PM