skip to navigation
skip to content

Planet Python

Last update: February 25, 2017 10:48 AM

February 24, 2017


DataCamp

Matplotlib Cheat Sheet: Plotting in Python

Data visualization and storytelling with your data are essential skills that every data scientist needs to communicate insights gained from analyses effectively to any audience out there. 

For most beginners, the first package that they use to get in touch with data visualization and storytelling is, naturally, Matplotlib: it is a Python 2D plotting library that enables users to make publication-quality figures. But, what might be even more convincing is the fact that other packages, such as Pandas, intend to build more plotting integration with Matplotlib as time goes on.

However, what might slow down beginners is the fact that this package is pretty extensive. There is so much that you can do with it and it might be hard to still keep a structure when you're learning how to work with Matplotlib.   

DataCamp has created a Matplotlib cheat sheet for those who might already know how to use the package to their advantage to make beautiful plots in Python, but that still want to keep a one-page reference handy. Of course, for those who don't know how to work with Matplotlib, this might be the extra push be convinced and to finally get started with data visualization in Python. 

(By the way, if you want to get started with this Python package, you might want to consider our Matplotlib tutorial.)

You'll see that this cheat sheet presents you with the six basic steps that you can go through to make beautiful plots. 

Check out the infographic by clicking on the button below:

Python Matplotlib cheat sheet

With this handy reference, you'll familiarize yourself in no time with the basics of Matplotlib: you'll learn how you can prepare your data, create a new plot, use some basic plotting routines to your advantage, add customizations to your plots, and save, show and close the plots that you make.

What might have looked difficult before will definitely be more clear once you start using this cheat sheet! 

Also, don't miss out on our other cheat sheets for data science that cover SciPyNumpyScikit-LearnBokehPandas and the Python basics.

February 24, 2017 09:11 PM


Weekly Python StackOverflow Report

(lxii) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2017-02-24 19:53:04 GMT


  1. Why is x**4.0 faster than x**4 in Python 3? - [119/2]
  2. How to limit the size of a comprehension? - [20/4]
  3. How to properly split this list of strings? - [13/5]
  4. Why is the __dict__ of instances so small in Python 3? - [11/1]
  5. Python Asynchronous Comprehensions - how do they work? - [10/2]
  6. Immutability in Python - [10/2]
  7. Order-invariant hash in Python - [9/2]
  8. Assign a number to each unique value in a list - [8/7]
  9. How to add a shared x-label and y-label to a plot created with pandas' plot? - [8/3]
  10. What happens when I inherit from an instance instead of a class in Python? - [8/2]

February 24, 2017 07:53 PM


Will Kahn-Greene

Who uses my stuff?

Summary

I work on a lot of different things. Some are applications, are are libraries, some I started, some other people started, etc. I have way more stuff to do than I could possibly get done, so I try to spend my time on things "that matter".

For Open Source software that doesn't have an established community, this is difficult.

This post is a wandering stream of consciousness covering my journey figuring out who uses Bleach.

Read more… (4 mins to read)

February 24, 2017 04:00 PM


Rene Dudfield

setup.cfg - a solution to python config file soup? A howto guide.

Sick of config file soup cluttering up your repo? Me too. However there is a way to at least clean it up for many python tools.


Some of the tools you might use and the config files they support...
  • flake8 - .flake8, setup.cfg, tox.ini, and config/flake8 on Windows
  • pytest - pytest.ini, tox.ini, setup.cfg
  • coverage.py - .coveragerc, setup.cfg, tox.ini
  • mypy - setup.cfg, mypy.ini
  • tox - tox.ini
 Can mypy use setup.cfg as well?
OK, you've convinced me. -- Guido

With that mypy now also supports setup.cfg, and we can all remove many more config files.

The rules for precedence are easy:
  1. read --config-file option - if it's incorrect, exit
  2. read [tool].ini - if correct, stop
  3. read setup.cfg

 

How to config with setup.cfg?

Here's a list to the configuration documentation for setup.cfg.

What does a setup.cfg look like now?

Here's an example setup.cfg for you with various tools configured. (note these are nonsensical example configs, not what I suggest you use!)

## http://coverage.readthedocs.io/en/latest/config.html
#[coverage:run]
#timid = True

## http://pytest.org/latest/customize.html#adding-default-options
# [tool:pytest]
# addopts=-v --cov pygameweb pygameweb/ tests/

## http://mypy.readthedocs.io/en/latest/config_file.html
#[mypy]
#python_version = 2.7

#[flake8]
#max-line-length = 120
#max-complexity = 10
#exclude = build,dist,docs/conf.py,somepackage/migrations,*.egg-info

## Run with: pylint --rcfile=setup.cfg somepackage
#[pylint]
#disable = C0103,C0111
#ignore = migrations
#ignore-docstrings = yes
#output-format = colorized



February 24, 2017 02:01 PM


Bhishan Bhandari

Implementing Stack using List in Python – Python Programming Essentials

Intro Stack is a collection of objects inserted and removed in a last-in first-out fashion (LIFO). Objects can be inserted onto stack at any time but only the object inserted last can be accessed or removed which coins the object to be top of the stack. Realization of Stack Operations using List   Methods Realization […]

February 24, 2017 05:52 AM


Vasudev Ram

Perl-like "unless" (reverse if) feature in Python

By Vasudev Ram



Flowchart image attribution

I was mentally reviewing some topics to discuss for a Python training program I was running. Among the topics were statements, including the if statement. I recollected that some languages I knew of, such as Perl, have an unless statement, which is like a reverse if statement, in that only the first nested suite (of statements) is executed if the Boolean condition is false, whereas only the second nested suite is executed if the condition is true. See the examples below.

The term "suite" used above, follows the terminology used in Python documentation such as the Python Language Reference; see this if statement definition, for example.

That is, for the if statement:
if condition:
suite1 # nested suite 1
else:
suite2 # nested suite 2
results in suite1 being run if condition is true, and suite2 being run if condition is false, whereas, for the unless statement:
unless condition:
suite1
else:
suite2
, the reverse holds true.

Of course, there is no unless statement in Python. So I got the idea of simulating it, at least partially, with a function, just for fun and as an experiment. Here is the first version, in file unless.py:
# unless.py v1

# A simple program to partially simulate the unless statement
# (a sort of reverse if statement) available in languages like Perl.
# The simulation is done by a function, not by a statement.

# Author: Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: https://jugad2.blogspot.com
# Product store: https://gumroad.com

# Define the unless function.
def unless(cond, func_if_false, func_if_true):
if not cond:
func_if_false()
else:
func_if_true()

# Test it.
def foo(): print "this is foo"
def bar(): print "this is bar"

a = 1
# Read the call below as:
# Unless a % 2 == 0, call foo, else call bar
unless (a % 2 == 0, foo, bar)
# Results in foo() being called.

a = 2
# Read the call below as:
# Unless a % 2 == 0, call foo, else call bar
unless (a % 2 == 0, foo, bar)
# Results in bar() being called.
Here is the output:
$ python unless.py
this is foo
this is bar
This simulation of unless works because functions are objects in Python (since almost everything is an object in Python, like almost everything in Unix is a file), so functions can be passed to other functions as arguments (by passing just their names, without following the names with parentheses).

Then, inside the unless function, when you apply the parentheses to those two function names, they get called.

This approach to simulation of the unless statement has some limitations, of course. One is that you cannot pass arguments to the functions [1]. (You could still make them do different things on different calls by using global variables (not good), reading from files, or reading from a database, so that their inputs could vary on each call).

[1] You can actually pass arguments to the functions in a few ways, such as using the *args and **kwargs features of Python, as additional arguments to unless() and then forwarding those arguments to the func_if_false() and func_if_true() calls inside unless().

Another limitation is that this simulation does not support the elif clause.

However, none of the above limitations matter, of course, because you can also get the effect of the unless statement (i.e. a reverse if) by just negating the Boolean condition (with the not operator) of an if statement. As I said, I just tried this for fun.

The image at the top of the post is of a flowchart.

For something on similar lines (i.e. simulating a language feature with some other code), but for the C switch statement simulated (partially) in Python, see this post I wrote a few months ago:

Simulating the C switch statement in Python

And speaking of Python language features, etc., here is a podcast interview with Guido van Rossum (creator of the Python language), about the past, present and future of Python.


Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Managed WordPress Hosting by FlyWheel



February 24, 2017 04:33 AM

February 23, 2017


Catalin George Festila

The bad and good urllib.

This is a simple python script:

import urllib
opener = urllib.FancyURLopener({})
f = opener.open("http://www.ra___aer.ro/")
d=f.read()
fo = open('workfile.txt', 'w')
fo.write(d)
fo.close()
The really bad news come from here:
http://blog.blindspotsecurity.com/2017/02/advisory-javapython-ftp-injections.html

February 23, 2017 03:11 PM


Stories in My Pocket

What is Your Burnout Telling You?

I am glad that mental health is being discussed more often in the programming world. In particular, I would like to thank Kenneth Reitz for his transparency over the last few years and contributing his recent essay on developer burnout.

Having just recently experienced a period of burnout, I would like to share some ideas that might help you skirt by it in your own life.

And, before I go on, let me say loud and clear, there is absolutely no shame in experiencing, or reaching out for help from, anxiety, fear, anger, burnout, depression, or any other mental condition.

If you think you might be struggling with something, please reach out for help! You might have to reach out several times to multiple sources, but remember you are too valuable to waste time with this funk.




My experience is different than Kenneth's. Instead of maintaining open source projects, I work on a small team that builds web projects to meet the business needs of our clients.

Some things he wrote about didn't quite resonate with my situation, so my I hope is this article will speak to others who have similar experiences to mine.

Trapped?

I've identified that I'm burnt out when I have the feeling of being trapped in a situation with an uncertain end.

One time, I was the only programmer scheduled on a project for a period of months. Another time, I was experiencing a lot of non-constructive criticism for the graphic design work I was doing.

In these situations and others, I felt trapped in an emotional well, thinking that there was no way someone else could help, or would want to help me, and I had no idea when this situation would end.

Emotions like these serve one very crucial role: they are a great indicator of what's going on internally. Instead of allowing these emotions to fuel axiety and make things worse, I'm learning there is great freedom in realizing: 

  • Emotions like these trigger thoughts that aren't true.
  • One does not have to respond to the emotions you are experiencing at any given moment. Instead, treat them as what they are: a warning light that something is off and needs to be adjusted.

In this case, I've identified that I need to be vocal to my manager whenever I realize I need support.

In my most recent time of burnout, I waited far too long before letting my manager know. I have a "workhorse" mentality that contributed to my burnout. I thought I could shoulder the load for a period of time, to get my team through. But instead of helping my team with this behavior, I hurt it by the reduced quality of my contributions to the project. And in the long run, I made things more difficult for my team.

Cool your jets

In times of burnout, I felt pressure coming from one area of my life that was off balance, say from my team at work. As the burnout condition persisted, the pressure would spread to other areas of my life.

I believe my recent experience of burnout was made worse by this very blog, for I launched it when I was in a period of light burnout. Once my work was public, my ambition put a lot of pressure on me to create high-quality content and build an audience. That pressure was too much, and I realized I had to step back.

In this case, I had no one to delegate the work to, so progress would have to stop. I had to be okay with that.

Throughout the last few months, I found it challenging to restrain my ambition, but I had a few great insights that helped to calm it down.

One was from one of the recent three-part episodes of the Developer Tea podcast where Jonathan interviewed Kalid Azad. Kalid has built an amazing collection of writings at his website, Better Explained. He revealed in the interview that he hardly contributed to his website for one year in the middle of its life, and yet it is a very successful site. So what do I have to worry about?

The second was from a book I recently started reading from Chip Ingram on the spiritual side of ambition, in which he draws on research to segment people into four groups. He advises the group I identify with to cool your jets, to allow a deeper purpose to set in and not just check off tasks on my to-do list.

These relived a lot of pressure from my ambition, and since then I've heard similar advice from other parties. So if any of you have similar ambition, I hope you can take heart that there is a large community of successful people out there that are telling us to be patient.

With that advice in mind, I'm going to try to allow myself to write during this recovery period, but only when I'm inspired, and only while it's still fun.

Prevention and recovery via balance

People believe living a balanced life will prevent many issues like burnout, but what does this balance look like?

I like Dan Miller's approach in his book, 48 Days to the Work You Love. He describes life as a wheel with seven spokes. When I'm experiencing burnout conditions, it's a good time to reflect and see if I'm out of balance with my:

  • Career
  • Finances
  • Social Life/Media
  • Family
  • Physical Fitness
  • Personal Development
  • Spirituality

Sometimes reviewing the list will show me that I should pay more attention to my physical fitness by taking walking breaks or going to the gym. Other times, I've found that my career or personal development segments are receiving too much attention, and I need to reduce my investment in them.

Practical advice for prevention

Like other people, I highly recommend hobbies to help prevent burnout. Though unlike most of them, I have found that given certain restraints, coding outside of work can be relaxing.

I have found that the more unrelated something is from what I'm doing at work, the better I can enjoy it. For example, when I was working in a JavaScript-heavy ASP.net application, I found coding in python to be a breath of fresh air.

When I was working on a large-scale, full-stack PHP website, exploring the Arduino platform was a welcome excursion.

As I mentioned in the last section, you need to have balance, so you might need to lay off of your hobby coding and pick up a coloring book or something for a few weeks to give your mind a chance to relax. You'll have a chance to pick back up on the coding project before too long.

In Conclusion
  • Remember that burnout can happen to anyone, and that there's no shame in talking about it.
  • Try to find the warning signs your body is giving you.
  • If you are going through a more difficult time, be open and vocal about what you're experiencing—especially with trusted friends and your manager.
  • Many thoughts triggered by anxiety and stress aren't true.
  • It's okay to take a break.
  • In the end burnout, anxiety, depression and the like are signs that your body is telling you to make adjustments.


February 23, 2017 01:47 PM


Ned Batchelder

A tale of two exceptions, continued

In my last blog post, A tale of two exceptions, I laid out the long drawn-out process of trying to get a certain exception to make tests skip in my test runner. I ended on a solution I liked at the time.

But it still meant having test-specific code in the product code, even if it was only a single line to set a base class for an exception. It didn't feel right to say "SkipTest" in the product code, even once.

In that blog post, I said,

One of the reasons I write this stuff down is because I'm hoping to get feedback that will improve my solution, or advance my understanding. ... a reader might object and say, "you should blah blah blah."

Sure enough, Ionel said,

A better way is to handle this in coverage's test suite. Possible solution: wrap all your tests in a decorator that reraises with a SkipException.

I liked this idea. The need was definitely a testing need, so it should be handled in the tests. First I tried doing something with pytest to get it to do the conversion of exceptions for me. But I couldn't find a way to make it work.

So: how to decorate all my tests? The decorator itself is fairly simple. Just call the method with all the arguments, and return its value, but if it raises StopEverything, then raise SkipTest instead:

def convert_skip_exceptions(method):
    """A decorator for test methods to convert StopEverything to SkipTest."""
    def wrapper(*args, **kwargs):
        """Run the test method, and convert exceptions."""
        try:
            result = method(*args, **kwargs)
        except StopEverything:
            raise unittest.SkipTest("StopEverything!")
        return result
    return wrapper

But decorating all the test methods would mean adding a @convert_skip_exceptions line to hundreds of test methods, which I clearly was not going to do. I could use a class decorator, which meant I would only have to add a decorator line to dozens of classes. That also felt like too much to do and remember to do in the future when I write new test classes.

It's not often I say this, but: it was time for a metaclass. Metaclasses are one of the darkest magics Python has, and they can be mysterious. At heart, they are simple, but in a place you don't normally think to look. Just as a class is used to make objects, a metaclass is used to make classes. Since there's something I want to do everytime I make a new class (decorate its methods), a metaclass gives me the tools to do it.

class SkipConvertingMetaclass(type):
    """Decorate all test methods to convert StopEverything to SkipTest."""
    def __new__(mcs, name, bases, attrs):
        for attr_name, attr_value in attrs.items():
            right_name = attr_name.startswith('test_')
            right_type = isinstance(attr_value, types.FunctionType)
            if right_name and right_type:
                attrs[attr_name] = convert_skip_exceptions(attr_value)

        return super(SkipConvertingMetaclass, mcs).__new__(mcs, name, bases, attrs)

There are details here that you can skip as incantations if you like. Classes are all instances of "type", so if we want to make a new thing that makes classes, it derives from type to get those same behaviors. The method that gets called when a new class is made is __new__. It gets passed the metaclass itself (just as classmethods get cls and instance methods get self), the name of the class, the tuple of base classes, and a dict of all the names and values defining the class (the methods, attributes, and so on).

The important part of this metaclass is what happens in the __new__ method. We look at all the attributes being defined on the class. If the name starts with "test_", and it's a function, then it's a test method, and we decorate the value with our decorator. Remember that @-syntax is just a shorthand for passing the function through the decorator, which we do here the old-fashioned way.

Then we use super to let the usual class-defining mechanisms in "type" do their thing. Now all of our test methods are decorated, with no explicit @-lines in the code. There's only one thing left to do: make sure all of our test classes use the metaclass:

CoverageTestMethodsMixin = SkipConvertingMetaclass('CoverageTestMethodsMixin', (), {})

class CoverageTest(
    ... some other mixins ...
    CoverageTestMethodsMixin,
    unittest.TestCase,
):
    """The base class for all coverage.py test classes."""

Metaclasses make classes, just the way classes make instances: you call them. Here we call our with the arguments it needs (class name, base classes, and attributes) to make a class called CoverageTestMethodsMixin.

Then we use CoverageTestMethodsMixin as one of the base classes of CoverageTest, which is the class used to derive all of the actual test classes.

Pro tip: if you are using unittest-style test classes, make a single class to be the base of all of your test classes, you will be glad.

After all of this class machinations, what have we got? Our test classes all derive from a base class which uses a metaclass to decorate all the test methods. As a result, any test which raises StopEverything will instead raise SkipTest to the test runner, and the test will be skipped. There's now no mention of SkipTest in the product code at all. Better.

February 23, 2017 12:36 PM


Experienced Django

Python, Snowboarding, and time

It’s mid-snowboard season here in North America, so time for learning, researching and writing is scarce.  I wanted to post to keep the blog alive and to give a shout to a friend who’s starting up a business selling some cool python swag.

I’m not in marketing so I won’t even pretend to give a pitch, other than to say I bought a mug (from coffee import *) and love it!

If you’re looking for something unique to give to a python geek (like yourself!) check it out:

https://nerdlettering.com/collections

OK.  I promise this blog won’t turn into a sales platform AND I promise to be back in the spring with more learning in python and django.  But for now, let it SNOW!!!!

February 23, 2017 02:07 AM

February 22, 2017


Eray Özkural (examachine)

Appreciation of Python's Elegance: Metaclasses and Design Patterns

Sometimes, it amazes me how elegant Python is. A class system that is almost as powerful as CLOS, yet easy to use and comprehend. It is truly one of the most powerful programming systems in the world, if you show

February 22, 2017 10:52 PM


Python Engineering at Microsoft

Python support in Visual Studio 2017

Over the last few months, Visual Studio 2017 has been in preview and many of you have been trying it out and providing feedback. We are very appreciative of everyone who has taken the time to do this.

As many noticed, during an update in January we removed Python support from the VS 2017 Release Candidate. This was done suddenly and without warning, and for that we apologize. We will be making a preview available at launch, and Python support will return to the main VS 2017 installer in an update.

Visual Studio 2017 RC installer showing Python development workload

I want to be clear that Python support will be returning in one of the first VS 2017 updates. We removed it only because we were not going to meet product-completeness targets that are needed to be a core part of the initial Visual Studio release, and are already well on our way to completing these. Specifically, we needed to translate our user interfaces and messages into the set of languages supported by Visual Studio in time for the main release. As anyone who has attempted to provide software with multiple languages will know, this is a unique challenge that requires changes throughout the entire project. We were not confident that we could do that and also resolve any new issues in time for March 7.

In the past, we released standalone installers to add Python support into Visual Studio. The extensibility changes in VS 2017 made simply going back to a standalone installer expensive, and this work would be thrown away when Python support is integrated directly into Visual Studio.

So here’s what we’re doing: around the time of the Visual Studio 2017 release, we will release a separate preview version of VS 2017 that includes Python support. We are currently planning for simultaneous release (March 7), but the stable release is the highest priority and plans for the preview may change.

During the Visual Studio 2017 online release event, there will be more details announced. However, here is what we can tell you so far:

Currently we are expecting Python support to be in the preview release for a few months, depending on our confidence in stability and user feedback. Once we move from preview to release, there will be an update and you’ll be able to select the Python workload in the stable release of Visual Studio. If you want to keep receiving previews of Python and other VS work, you can keep the preview installed, or you can delete it.

We want to thank everyone for your patience and sticking with us as we get ready for release. In many ways, Visual Studio 2017 is going to be our best release yet, and we are looking forward to being able to let all of you use it.

February 22, 2017 08:00 PM


Catalin George Festila

The twill python module with Fedora 25.

Today I tested the twill python module with python 2.7 and Fedora 25.
This is: a scripting system for automating Web browsing. Useful for testing Web pages or grabbing data from password-protected sites automatically.
To install this python module I used pip command:
[root@localhost mythcat]# pip install twill
Collecting twill
Downloading twill-1.8.0.tar.gz (176kB)
100% |████████████████████████████████| 184kB 2.5MB/s
Installing collected packages: twill
Running setup.py install for twill ... done
Successfully installed twill-1.8.0

Let's try some tests:

[mythcat@localhost ~]$ python
Python 2.7.13 (default, Jan 12 2017, 17:59:37)
[GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from twill import get_browser
>>> b = get_browser()
>>>
>>> from twill.commands import *
>>> go("http://www.python.org/")
==> at https://www.python.org/
u'https://www.python.org/'
>>> b.showforms()

Form #1
## ## __Name__________________ __Type___ __ID________ __Value__________________
1 q search id-searc ...
To talk to the Web browser directly, call the get_browser function.
You can see most of the twill commands by using:
>>> import twill.shell
>>> twill.shell.main()

-= Welcome to twill! =-

current page: https://www.python.org/widgets
>> ?

Undocumented commands:
======================
add_auth fa info save_html title
add_extra_header find load_cookies setglobal url
agent follow notfind setlocal
back formaction redirect_error show
clear_cookies formclear redirect_output show_cookies
clear_extra_headers formfile reload show_extra_headers
code formvalue reset_browser showforms
config fv reset_error showhistory
debug get_browser reset_output showlinks
echo getinput run sleep
exit getpassword runfile submit
extend_with go save_cookies tidy_ok

current page: https://www.python.org/widgets
>>
Basic is used by setlocal to fill website forms and the go function.
Ban can be very good for some tasks.
Now twill also provides a simple wrapper for mechanize functionality with the API is still unstable.

February 22, 2017 03:37 PM


DataCamp

New Course: Unsupervised Learning in Python

Today we're launching our new course Unsupervised Learning in Python by Benjamin Wilson. Say you have a collection of customers with a variety of characteristics such as age, location, and financial history, and you wish to discover patterns and sort them into clusters. Or perhaps you have a set of texts, such as wikipedia pages, and you wish to segment them into categories based on their content. This is the world of unsupervised learning, called as such because you are not guiding, or supervising, the pattern discovery by some prediction task, but instead uncovering hidden structure from unlabeled data. Unsupervised learning encompasses a variety of techniques in machine learning, from clustering to dimension reduction to matrix factorization.

In this course, you'll learn the fundamentals of unsupervised learning and implement the essential algorithms using scikit-learn and scipy. You will learn how to cluster, transform, visualize, and extract insights from unlabeled datasets, and end the course by building a recommender system to recommend popular musical artists.

Start for free

Unsupervised Learning in Python features interactive exercises that combine high-quality video, in-browser coding, and gamification for an engaging learning experience that will make you a master at Data Science with Python!

What you'll learn

In the first chapter, you'll learn how to discover the underlying groups (or "clusters") in a dataset. By the end of the chapter, you'll be clustering companies using their stock market prices, and distinguishing different species by clustering their measurements. Start first chapter for free here.

In the second you'll learn about two unsupervised learning techniques for data visualization, hierarchical clustering and t-SNE.

The third chapter is about the most fundamental of dimension reduction techniques, "Principal Component Analysis" ("PCA"). PCA is often used before supervised learning to improve model performance and generalization.

Finally, you'll end the course by learning about a dimension reduction technique called "Non-negative matrix factorization" ("NMF") that expresses samples as combinations of interpretable parts. For example, it expresses documents as combinations of topics, and images in terms of commonly occurring visual patterns.

About Ben: Ben is a machine learning specialist and the director of research at lateral.io. He is passionate about learning and has worked as a data scientist in real-time bidding, e-commerce, and recommendation. Ben holds a PhD in mathematics and a degree in computer science.

Start course for free

February 22, 2017 03:06 PM


PyCharm

PyCharm 2017.1 EAP 8 (build 171.3566.25)

Our eighth Early Access Program (EAP) release for PyCharm 2017.1 is out now! Get it from our EAP page.

New Features

Bug Fixes

See our release notes for more details about improvements in this version.

Any improvements marked ‘Pro only’ are only available in PyCharm Professional Edition. You can use the EAP version of PyCharm Professional Edition for free for 30 days.

We’d like to encourage you to try out this new EAP version. To keep up-to-date with our EAP releases set your update channel to Early Access Program: Settings | Appearance & Behavior | System Settings | Updates, Automatically check updates for “Early Access Program”

We do our best to find all bugs before we release, but in these preview builds there might still be some bugs in the product. If you find one, please let us know on YouTrack, or contact us on Twitter @PyCharm.

-PyCharm Team
The Drive to Develop

 

February 22, 2017 02:57 PM


Wesley Chun

Adding text & shapes with the Google Slides API

NOTE: Watch for this blog post (coming soon) as a video in this series.

Introduction

This is the fourth entry highlighting primary use cases of the Google Slides API with Python; check back in the archives to access the first three. Today, we're focused on some of the basics, like adding text to slides. We'll also cover adding shapes, and as a bonus, adding text into shapes!

Using the Google Slides API

The demo script requires creating a new slide deck (and adding a new slide) so you need the read-write scope for Slides:
If you're new to using Google APIs, we recommend reviewing earlier posts & videos covering the setting up projects and the authorization boilerplate so that we can focus on the main app. Once we've authorized our app, assume you have a service endpoint to the API and have assigned it to the SLIDES variable.

Create new presentation & get its objects' IDs

A new slide deck can be created with SLIDES.presentations().create()—or alternatively with the Google Drive API which we won't do here. From the API response, we save the new deck's ID along with the IDs of the title and subtitle textboxes on the default title slide:
rsp = SLIDES.presentations().create(
body={'title': 'Adding text formatting DEMO'}).execute()
deckID = rsp['presentationId']
titleSlide = rsp['slides'][0] # title slide object IDs
titleID = titleSlide['pageElements'][0]['objectId']
subtitleID = titleSlide['pageElements'][1]['objectId']
The title slide only has two elements on it, the title and subtitle textboxes, returned in that order, hence why we grab them at indexes 0 and 1 respectively.

Generating our own unique object IDs

In the next steps, we generate our own unique object IDs. We'll first explain what those objects are followed by why you'd want to create your own object IDs rather than letting the API create default IDs for the same objects.

As we've done in previous posts on the Slides API, we create one new slide with the "main point" layout. It has one notable object, a "large-ish" textbox and nothing else. We'll create IDs for the slide itself and another for its textbox. Next, we'll (use the API to) "draw" 3 shapes on this slide, so we'll create IDs for each of those. That's 5 (document-unique) IDs total. Now let's discuss why you'd "roll your own" IDs.

Why and how to generate our own IDs

It's advantageous for all developers to minimize the overall number of calls to Google APIs. While most of services provided through the APIs are free, they'll have some quota to prevent abuse (Slides API quotas page FYI). So how does creating our own IDs help reduce API calls?

Passing in object IDs is optional for "create" calls. Providing your own ID lets you create an object and modify it using additional requests within the same API call to SLIDES.presentations().batchUpdate(). If you don't provide your own object IDs, the API will generate a unique one for you.

Unfortunately, this means that instead of one API call, you'll need one to create the object, likely another to get that object to determine its ID, and yet another to update that object using the ID you just fetched. Separate API calls to create, get, and update means (at least) 3x more than if you provided your own IDs (where you can do create & update with a single API call; no get necessary).

Here are a few things to know when rolling your own IDs:

You'll somehow need to ensure your IDs are unique or use UUIDs (universally unique identifiers) for which most languages have libraries for. Examples: Java developers can use java.util.UUID.randomUUID().toString() while Python users can import the uuid module plus any extra work to get UUID string values:
import uuid
gen_uuid = lambda : str(uuid.uuid4()) # get random UUID string
Finally, be aware that if an object is modified in the UI, its ID may change. For more information, review the "Working with object IDs" section in the Slides API Overview page.

Back to sample app

All that said, let's go back to the code and generate those 5 random object IDs we promised earlier:
mpSlideID   = gen_uuid() # mainpoint IDs
mpTextboxID = gen_uuid()
smileID = gen_uuid() # shape IDs
str24ID = gen_uuid()
arwbxID = gen_uuid()
With that, we're ready to create the requests array (reqs) to send to the API.

Create "main point" slide

The first request creates the "main point" slide...
reqs = [
{'createSlide':
'objectId': mpSlideID,
'slideLayoutReference': {'predefinedLayout': 'MAIN_POINT'},
'placeholderIdMappings': [{
'objectId': mpTextboxID,
'layoutPlaceholder': {'type': 'TITLE', 'index': 0}
}],
}},
...where...
The page elements created on the new slide (depends [obviously] on the layout chosen); "main point" only has the one textbox, hence why placeholderIdMappings only has one element.

Add title slide and main point textbox text

The next requests fill in the title & subtitle in the default title slide and also the textbox on the main point slide.
{'insertText': {'objectId': titleID, 'text': 'Adding text and shapes'}},
{'insertText': {'objectId': subtitleID, 'text': 'via the Google Slides API'}},
{'insertText': {'objectId': mpTextboxID, 'text': 'text & shapes'}},
The first pair use IDs that were generated by the Slides API when the presentation was created while the main point textbox ID was generated by us.

Create three shapes

Above, we created IDs for three shapes, a "smiley face," a 24-point star, and a double arrow box (smileID, str24ID, arwbxID). The request for the first looks like this:
{'createShape': {
'objectId': smileID,
'shapeType': 'SMILEY_FACE',
'elementProperties': {
"pageObjectId": mpSlideID,
'size': {
'height': {'magnitude': 3000000, 'unit': 'EMU'},
'width': {'magnitude': 3000000, 'unit': 'EMU'}
},
'transform': {
'unit': 'EMU', 'scaleX': 1.3449, 'scaleY': 1.3031,
'translateX': 4671925, 'translateY': 450150,
},
},
}}
The JSON for the other two shapes are similar, with differences being: the object ID, the shapeType, and the transform. You can see the corresponding requests for the other shapes in the full source code at the bottom of this post, so we won't display them here as the descriptions will be nearly identical.

Size & transform for slide objects

When placing or manipulating objects on slides, key element properties you must provide are the sizes and transforms. These are components you must either use some math to create or derive from pre-existing objects. Resizing, rotating, and similar operations require some basic knowledge of matrix math. Take a look at the Page Elements page in the official docs as well as the Transforms concept guide for more details.

Deriving from pre-existing objects: if you're short on time, don't want to deal with the math, or perhaps thinking something like, "Geez, I just want to draw a smiley face on a slide." One common pattern then, is to bring up the Slides UI, create a blank slide & place your image or draw your shape the way you want, with the size you want, & putting it exactly where you want. For example:


Once you have that desired shape (and size and location), you can use the API (either presentations.get or presentations.pages.get) to read that object's size and transform then drop both of those into your application so the API creates a new shape in the exact way, mirroring what you created in the UI. For the smiley face above, the JSON payload we got back from one of the "get" calls could look something like:

If you scroll back up to the createShape request, you'll see we used those exact values. Note: because the 3 shapes are all in different locations and sizes, expect the corresponding values for each shape to be different.

Bonus: adding text to shapes

Now that you know how to add text and shapes, it's only fitting that we show you how to add text into shapes. The good news is that the technique is no different than adding text to textboxes or even tables. So with the shape IDs, our final set of requests along with the batchUpdate() call looks like this:
    {'insertText': {'objectId': smileID, 'text': 'Put the nose somewhere here!'}},
{'insertText': {'objectId': str24ID, 'text': 'Count 24 points on this star!'}},
{'insertText': {'objectId': arwbxID, 'text': "An uber bizarre arrow box!"}},
] # end of 'reqs'
SLIDES.presentations().batchUpdate(body={'requests': reqs},
presentationId=deckID).execute()

Conclusion

If you run the script, you should get output that looks something like this, with each print() representing each API call:
$ python3 slides_shapes_text.py 
** Create new slide deck & set up object IDs
** Create "main point" slide, add text & interesting shapes
DONE
When the script has completed, you should have a new presentation with a title slide and a main point slide with shapes which should look something like this:

Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!)—by using, copying, and/or modifying this code or any other piece of source from this blog, you implicitly agree to its Apache2 license:
from __future__ import print_function
import uuid

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

gen_uuid = lambda : str(uuid.uuid4()) # get random UUID string

SCOPES = 'https://www.googleapis.com/auth/presentations',
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
creds = tools.run_flow(flow, store)
SLIDES = discovery.build('slides', 'v1', http=creds.authorize(Http()))

print('** Create new slide deck & set up object IDs')
rsp = SLIDES.presentations().create(
body={'title': 'Adding text & shapes DEMO'}).execute()
deckID = rsp['presentationId']
titleSlide = rsp['slides'][0] # title slide object IDs
titleID = titleSlide['pageElements'][0]['objectId']
subtitleID = titleSlide['pageElements'][1]['objectId']
mpSlideID = gen_uuid() # mainpoint IDs
mpTextboxID = gen_uuid()
smileID = gen_uuid() # shape IDs
str24ID = gen_uuid()
arwbxID = gen_uuid()

print('** Create "main point" slide, add text & interesting shapes')
reqs = [
# create new "main point" layout slide, giving slide & textbox IDs
{'createSlide': {
'objectId': mpSlideID,
'slideLayoutReference': {'predefinedLayout': 'MAIN_POINT'},
'placeholderIdMappings': [{
'objectId': mpTextboxID,
'layoutPlaceholder': {'type': 'TITLE', 'index': 0}
}],
}},
# add title & subtitle to title slide; add text to main point slide textbox
{'insertText': {'objectId': titleID, 'text': 'Adding text and shapes'}},
{'insertText': {'objectId': subtitleID, 'text': 'via the Google Slides API'}},
{'insertText': {'objectId': mpTextboxID, 'text': 'text & shapes'}},
# create smiley face
{'createShape': {
'objectId': smileID,
'shapeType': 'SMILEY_FACE',
'elementProperties': {
"pageObjectId": mpSlideID,
'size': {
'height': {'magnitude': 3000000, 'unit': 'EMU'},
'width': {'magnitude': 3000000, 'unit': 'EMU'}
},
'transform': {
'unit': 'EMU', 'scaleX': 1.3449, 'scaleY': 1.3031,
'translateX': 4671925, 'translateY': 450150,
},
},
}},
# create 24-point star
{'createShape': {
'objectId': str24ID,
'shapeType': 'STAR_24',
'elementProperties': {
"pageObjectId": mpSlideID,
'size': {
'height': {'magnitude': 3000000, 'unit': 'EMU'},
'width': {'magnitude': 3000000, 'unit': 'EMU'}
},
'transform': {
'unit': 'EMU', 'scaleX': 0.7079, 'scaleY': 0.6204,
'translateX': 2036175, 'translateY': 237350,
},
},
}},
# create double left & right arrow w/textbox
{'createShape': {
'objectId': arwbxID,
'shapeType': 'LEFT_RIGHT_ARROW_CALLOUT',
'elementProperties': {
"pageObjectId": mpSlideID,
'size': {
'height': {'magnitude': 3000000, 'unit': 'EMU'},
'width': {'magnitude': 3000000, 'unit': 'EMU'}
},
'transform': {
'unit': 'EMU', 'scaleX': 1.1451, 'scaleY': 0.4539,
'translateX': 1036825, 'translateY': 3235375,
},
},
}},
# add text to all 3 shapes
{'insertText': {'objectId': smileID, 'text': 'Put the nose somewhere here!'}},
{'insertText': {'objectId': str24ID, 'text': 'Count 24 points on this star!'}},
{'insertText': {'objectId': arwbxID, 'text': "An uber bizarre arrow box!"}},
]
SLIDES.presentations().batchUpdate(body={'requests': reqs},
presentationId=deckID).execute()
print('DONE')
As with our other code samples, you can now customize it to learn more about the API, integrate into other apps for your own needs, for a mobile frontend, sysadmin script, or a server-side backend!

Code challenge

Create a 2x3 or 3x4 table on a slide and add text to each "cell." This should be a fairly easy exercise, especially if you look at the Table Operations documentation. HINT: you'll be using insertText with just an extra field, cellLocation. EXTRA CREDIT: generalize your solution so that you're grabbing cells from a Google Sheet and "import" them into a table on a slide. HINT: look for an earlier post where we describe how to create slides from spreadsheet data.

February 22, 2017 02:40 PM


Caktus Consulting Group

Python type annotations

When it comes to programming, I have a belt and suspenders philosophy. Anything that can help me avoid errors early is worth looking into.

The type annotation support that's been gradually added to Python is a good example. Here's how it works and how it can be helpful.

Introduction

The first important point is that the new type annotation support has no effect at runtime. Adding type annotations in your code has no risk of causing new runtime errors: Python is not going to do any additional type-checking while running.

Instead, you'll be running separate tools to type-check your programs statically during development. I say "separate tools" because there's no official Python type checking tool, but there are several third-party tools available.

So, if you chose to use the mypy tool, you might run:

$ mypy my_code.py

and it might warn you that a function that was annotated as expecting string arguments was going to be called with an integer.

Of course, for this to work, you have to be able to add information to your code to let the tools know what types are expected. We do this by adding "annotations" to our code.

One approach is to put the annotations in specially-formatted comments. The obvious advantage is that you can do this in any version of Python, since it doesn't require any changes to the Python syntax. The disadvantages are the difficulties in writing these things correctly, and the coincident difficulties in parsing them for the tools.

To help with this, Python 3.0 added support for adding annotations to functions (PEP-3107), though without specifying any semantics for the annotations. Python 3.6 adds support for annotations on variables (PEP-526).

Two additional PEPs, PEP-483 and PEP-484, define how annotations can be used for type-checking.

Since I try to write all new code in Python 3, I won't say any more about putting annotations in comments.

Getting started

Enough background, let's see what all this looks like.

Python 3.6 was just released, so I’ll be using it. I'll start with a new virtual environment, and install the type-checking tool mypy (whose package name is mypy-lang).:

$ virtualenv -p $(which python3.6) try_types
$ . try_types/bin/activate
$ pip install mypy-lang

Let's see how we might use this when writing some basic string functions. Suppose we're looking for a substring inside a longer string. We might start with:

def search_for(needle, haystack):
    offset = haystack.find(needle)
    return offset

If we were to call this with anything that's not text, we'd consider it an error. To help us avoid that, let's annotate the arguments:

def search_for(needle: str, haystack: str):
    offset = haystack.find(needle)
    return offset

Does Python care about this?:

$ python search1.py
$

Python is happy with it. There's not much yet for mypy to check, but let's try it:

$ mypy search1.py
$

In both cases, no output means everything is okay.

(Aside: mypy uses information from the files and directories on its command line plus all packages they import, but it only does type-checking on the files and directories on its command line.)

So far, so good. Now, let's call our function with a bad argument by adding this at the end:

search_for(12, "my string")

If we tried to run this, it wouldn't work:

$ python search2.py
Traceback (most recent call last):
    File "search2.py", line 4, in <module>
        search_for(12, "my string")
    File "search2.py", line 2, in search_for
        offset = haystack.find(needle)
TypeError: must be str, not int

In a more complicated program, we might not have run that line of code until sometime when it would be a real problem, and so wouldn't have known it was going to fail. Instead, let's check the code immediately:

$ mypy search2.py
search2.py:4: error: Argument 1 to "search_for" has incompatible type "int"; expected "str"

Mypy spotted the problem for us and explained exactly what was wrong and where.

We can also indicate the return type of our function:

def search_for(needle: str, haystack: str) -> str:
    offset = haystack.find(needle)
    return offset

and ask mypy to check it:

$ mypy search3.py
search3.py: note: In function "search_for":
search3.py:3: error: Incompatible return value type (got "int", expected "str")

Oops, we're actually returning an integer but we said we were going to return a string, and mypy was smart enough to work that out. Let's fix that:

def search_for(needle: str, haystack: str) -> int:
    offset = haystack.find(needle)
    return offset

And see if it checks out:

$ mypy search4.py
$

Now, maybe later on we forget just how our function works, and try to use the return value as a string:

x = len(search_for('the', 'in the string'))

Mypy will catch this for us:

$ mypy search5.py
search5.py:5: error: Argument 1 to "len" has incompatible type "int"; expected "Sized"

We can't call len() on an integer. Mypy wants something of type Sized -- what's that?

More complicated types

The built-in types will only take us so far, so Python 3.5 added the typing module, which both gives us a bunch of new names for types, and tools to build our own types.

In this case, typing.Sized represents anything with a __len__ method, which is the only kind of thing we can call len() on.

Let's write a new function that'll return a list of the offsets of all of the instances of some string in another string. Here it is:

from typing import List

def multisearch(needle: str, haystack: str) -> List[int]:
    # Not necessarily the most efficient implementation
    offset = haystack.find(needle)
    if offset == -1:
        return []
    return [offset] + multisearch(needle, haystack[offset+1:])

Look at the return type: List[int]. You can define a new type, a list of a particular type of elements, by saying List and then adding the element type in square brackets.

There are a number of these - e.g. Dict[keytype, valuetype] - but I'll let you read the documentation to find these as you need them.

mypy passed the code above, but suppose we had accidentally had it return None when there were no matches:

def multisearch(needle: str, haystack: str) -> List[int]:
    # Not necessarily the most efficient implementation
    offset = haystack.find(needle)
    if offset == -1:
        return None
    return [offset] + multisearch(needle, haystack[offset+1:])

mypy should spot that there's a case where we don't return a list of integers, like this:

$ mypy search6.py
$

Uh-oh - why didn't it spot the problem here? It turns out that by default, mypy considers None compatible with everything. To my mind, that's wrong, but luckily there's an option to change that behavior:

$ mypy --strict-optional search6.py
search6.py: note: In function "multisearch":
search6.py:7: error: Incompatible return value type (got None, expected List[int])

I shouldn't have to remember to add that to the command line every time, though, so let's put it in a configuration file just once. Create mypy.ini in the current directory and put in:

[mypy]
strict_optional = True

And now:

$ mypy search6.py
search6.py: note: In function "multisearch":
search6.py:7: error: Incompatible return value type (got None, expected List[int])

But speaking of None, it's not uncommon to have functions that can either return a value or None. We might change our search_for method to return None if it doesn't find the string, instead of -1:

def search_for(needle: str, haystack: str) -> int:
    offset = haystack.find(needle)
    if offset == -1:
        return None
    else:
        return offset

But now we don't always return an int and mypy will rightly complain:

$ mypy search7.py
search7.py: note: In function "search_for":
search7.py:4: error: Incompatible return value type (got None, expected "int")

When a method can return different types, we can annotate it with a Union type:

from typing import Union

def search_for(needle: str, haystack: str) -> Union[int, None]:
    offset = haystack.find(needle)
    if offset == -1:
        return None
    else:
        return offset

There's also a shortcut, Optional, for the common case of a value being either some type or None:

from typing import Optional

def search_for(needle: str, haystack: str) -> Optional[int]:
    offset = haystack.find(needle)
    if offset == -1:
        return None
    else:
        return offset

Wrapping up

I've barely touched the surface, but you get the idea.

One nice thing is that the Python libraries are all annotated for us already. You might have noticed above that mypy knew that calling find on a str returns an int - that's because str.find is already annotated. So you can get some benefit just by calling mypy on your code without annotating anything at all -- mypy might spot some misuses of the libraries for you.

For more reading:

February 22, 2017 01:00 PM


DataCamp

Pandas Cheat Sheet for Data Science in Python

The Pandas library is one of the most preferred tools for data scientists to do data manipulation and analysis, next to matplotlib for data visualization and NumPy, the fundamental library for scientific computing in Python on which Pandas was built. 

The fast, flexible, and expressive Pandas data structures are designed to make real-world data analysis significantly easier, but this might not be immediately the case for those who are just getting started with it. Exactly because there is so much functionality built into this package that the options are overwhelming.

That's where this Pandas cheat sheet might come in handy. 

It's a quick guide through the basics of Pandas that you will need to get started on wrangling your data with Python. 

As such, you can use it as a handy reference if you are just beginning their data science journey with Pandas or, for those of you who already haven't started yet, you can just use it as a guide to make it easier to learn about and use it. 

Python Pandas Cheat Sheet

The Pandas cheat sheet will guide you through the basics of the Pandas library, going from the data structures to I/O, selection, dropping indices or columns, sorting and ranking, retrieving basic information of the data structures you're working with to applying functions and data alignment.

In short, everything that you need to kickstart your data science learning with Python!

Do you want to learn more? Start the Intermediate Python For Data Science course for free now or try out our Pandas DataFrame tutorial

Also, don't miss out on our Bokeh cheat sheet for data visualization in Python and our Python cheat sheet for data science

February 22, 2017 12:52 PM


Talk Python to Me

#100 Python past, present, and future with Guido van Rossum

Welcome to a very special episode. This is the 100th episode of Talk Python To Me. It's the perfect chance to take a moment and look at where we have come from, and where we are going. Not just with regard to the podcast but for Python in general. <br/> <br/> And who better to do this than Python's inventor himself. Guido van Rossum. In this episode, we discuss how Guido go into programming, where Python came from and why, and Python's bright future with Python 3. <br/> <br/> Links from the show: <br/> <div style="font-size: .85em;"> <br/> <b>Guido on Twitter</b>: <a href='https://twitter.com/gvanrossum' target='_blank'>@gvanrossum</a> <br/> <b>What's New In Python 3.6</b>: <a href='https://docs.python.org/3/whatsnew/3.6.html' target='_blank'>docs.python.org/3/whatsnew/3.6.html</a> <br/> <b>mypy</b>: <a href='http://mypy-lang.org/' target='_blank'>mypy-lang.org</a> <br/> <br/> <strong>Sponsored items</strong> <br/> <b>Rollbar</b>: <a href='https://rollbar.com/talkpythontome' target='_blank'>rollbar.com/talkpythontome</a> <br/> <b>Hired</b>: <a href='https://hired.com/?utm_source=podcast&utm_medium=talkpythontome&utm_term=cat-tech-software&utm_content=2k&utm_campaign=q1-16-episodesbanner' target='_blank'>hired.com/talkpythontome</a> <br/> <b>Our courses</b>: <a href='https://training.talkpython.fm/' target='_blank'>training.talkpython.fm</a> <br/> <b>Podcast's Patreon</b>: <a href='https://www.patreon.com/mkennedy' target='_blank'>patreon.com/mkennedy</a> <br/> </div>

February 22, 2017 08:00 AM

February 21, 2017


DSPIllustrations.com

Circular Convolution Example

Circular Convolution

In a previous post, we have explained the importance of the convolution operation for signal processing and signal analysis. We have described the convolution integral and illustrated the involved functions.

In this post we will focus on an operation called Circular convolution which is strongly related to the conventional convolution (also called linear convolution) we have described before. Let us reconsider the normal, linear convolution in the discrete domain. Given two sequences x[n] and h[n], their convolution is given by

(x*h)[n] = \sum_{n'=-\infty}^{\infty}x[n']\cdot h[n-n'], \quad n=-\infty,\dots,\infty.

The linear convolution lets one one sequence slide over the other and sums the overlapping parts. The circular convolution of two sequences x[n], h[n] is now considering a wrap-around of the sequences after a period of N samples. So, the circular convolution is defined by

(x\otimes h)[n]=\sum_{n'=0}^{N-1}x[n']\cdot h[(n'-n)_N], \quad n=0,\dots,N-1,

where (n'-n)_N gives the remainder of n'-n divided by N. For example, (-1)_N=N-1. This means, if the index for x[(n'-n)_N] would leave the range 0,\dots,N-1 to the left, it would wrap around and come in from the right again. This means that the circular convolution is periodic with length N.

In discrete domain, the convolution theorem actually holds only for the circular ...

February 21, 2017 11:00 PM


William Minchin

Summary Plugin 1.1.0 for Pelican Released

Summary is a plugin for Pelican, a static site generator written in Python.

Summary allows easy, variable length summaries directly embedded into the body of your articles.

Installation

The easiest way to install Summary is through the use of pip. This will also install the required dependencies automatically (currently none beyond Pelican).

pip install minchin.pelican.plugins.summary

Then, in your pelicanconf.py file, add Summary to your list of plugins:

PLUGINS = [
          # ...
          'minchin.pelican.plugins.summary',
          # ...
        ]

You may also need to configure the summary start and end markers (see Configuration below).

Configuration and Usage

This plugin introduces two new settings: SUMMARY_BEGIN_MARKER and SUMMARY_END_MARKER: strings which can be placed directly into an article to mark the beginning and end of a summary. When found, the standard SUMMARY_MAX_LENGTH setting will be ignored. The markers themselves will also be removed from your articles before they are published. The default values are <!-- PELICAN_BEGIN_SUMMARY --> and <!-- PELICAN_END_SUMMARY -->.

If no beginning or end marker is found, and if SUMMARY_USE_FIRST_PARAGRAPH is enabled in the settings, the summary will be the first paragraph of the post.

The plugin also sets a has_summary attribute on every article. It is True for articles with an explicitly-defined summary, and False otherwise. (It is also False for an article truncated by SUMMARY_MAX_LENGTH.) Your templates can use this e.g. to add a link to the full text at the end of the summary.

Known Issues

An issue, as such, is that there is no formal test suite. Testing is currently limited to my in-use observations. I also run a basic check upon uploaded the package to PyPI that it can be downloaded and loaded into Python.

The package is tested in Python 3.5; compatibility with other version of Python is unknown.

Tests are actaully included and can be run from the root directory:

python minchin/pelican/plugins/summary/test_summary.py

Changes

This version is basically just repackaging the plugin and making it available on pip.

Code

The code for this project is available on GitHub. Contributions are welcome!

Credits

Original plugin from the Pelican-Plugins repo.

License

The plugin code is assumed to be under the AGPLv3 license (this is the license of the Pelican-Plugins repo).

February 21, 2017 09:51 PM


Mike Driscoll

Python 3 – Unpacking Generalizations

Python 3.5 added more support for Unpacking Generalizations in PEP 448. According to the PEP, it added extended usages of the * iterable unpacking operator and ** dictionary unpacking operators to allow unpacking in more positions, an arbitrary number of times, and in additional circumstances. What this means is that we can now make calls to functions with an arbitrary number of unpackings. Let’s take a look at a dict() example:

>>> my_dict = {'1':'one', '2':'two'}
>>> dict(**my_dict, w=6)
{'1': 'one', '2': 'two', 'w': 6}
>>> dict(**my_dict, w='three', **{'4':'four'})
{'1': 'one', '2': 'two', 'w': 'three', '4': 'four'}

Interestingly, if the keys are something other then strings, the unpacking doesn’t work:

>>> my_dict = {1:'one', 2:'two'}
>>> dict(**my_dict)
Traceback (most recent call last):
  File "<pyshell#27>", line 1, in <module>
    dict(**my_dict)
TypeError: keyword arguments must be strings

Update: One of my readers was quick to point out that the reason this doesn’t work is because I was trying to unpack into a function call (i.e. dict()). If I had done the unpacking using just dict syntax, the integer keys would have worked fine. Here’s what I’m talking about:

>>> {**{1: 'one', 2:'two'}, 3:'three'}
{1: 'one', 2: 'two', 3: 'three'}

One other interesting wrinkle to dict unpacking is that later values will always override earlier ones. There’s a good example in the PEP that demonstrates this:

>>> {**{'x': 2}, 'x': 1}
{'x': 1}

I thought that was pretty neat. You can do the same sort of thing with ChainMap from the collections module, but this is quite a bit simpler.

However this new unpacking also works for tuples and lists. Let’s try combining some items of different types into one list:

>>> my_tuple = (11, 12, 45)
>>> my_list = ['something', 'or', 'other']
>>> my_range = range(5)
>>> combo = [*my_tuple, *my_list, *my_range]
>>> combo
[11, 12, 45, 'something', 'or', 'other', 0, 1, 2, 3, 4]

Before this unpacking change, you would have had do something like this:

>>> combo = list(my_tuple) + list(my_list) + list(my_range)
[11, 12, 45, 'something', 'or', 'other', 0, 1, 2, 3, 4]

I think the new syntax is actually quite handy for these kinds of circumstances. I’ve actually run into this a time or two in Python 2 where this new enhancement would have been quite useful.


Wrapping Up

There are lots of other examples in PEP 448 that are quite interesting to read about and try in Python’s interpreter. I highly recommend checking it out and giving this feature a try. I am hoping to start using some of these features in my new code whenever we finally move to Python 3.

February 21, 2017 06:15 PM


DataCamp

Free DataCamp for your Classroom

Announcing: DataCamp for the classroom, a new free plan for Academics.

We want to support every student that wants to learn Data Science. That is why, as of today, professors/teachers/TA’s/… can give their students 6 months of FREE access to the full DataCamp course curriculum when used in the classroom. Request your free classroom today.

1) Student Benefits

When you set-up your DataCamp for the classroom account, each student will automatically have full access to the entire course curriculum (>250 hours). This includes access to premium courses such as:

In addition, students can participate in leaderboards and private discussion forums with their fellow classmates. 

2) Professor/Instructors/Teacher/... Benefits

Via DataCamp for the classroom, the instructor has access to all our handy group features: 

3) Get Started

Get started with DataCamp for the classroom and join professors from Duke University, Notre-Dame, Harvard, University College London, UC Berkeley and much more, in learning data science by doing with DataCamp.

START TODAY

February 21, 2017 02:22 PM


Chris Moffitt

Populating MS Word Templates with Python

Introduction

In a previous post, I covered one approach for generating documents using HTML templates to create a PDF. While PDF is great, the world still relies on Microsoft Word for document creation. In reality, it will be much simpler for a business user to create the desired template that supports all the custom formatting they need in Word versus trying to use HTML+CSS. Fortunately, there is a a package that supports doing a MS Word mailmerge purely within python. This approach has the advantage of running on any system - even if Word is not installed. The benefit to using python for the merge (vs. an Excel sheet) is that you are not limited in how you retrieve or process the data. The full flexibility and power of the python ecosystem is at your finger tips. This should be a useful tool to keep in mind any time you need to automate document creation.

Background

The package that makes all of this possible is fittingly called docx-mailmerge. It is a mature package that can parse the MS Word docx file, find the merge fields and populate them with whatever values you need. The package also support some helper functions for populating tables and generating single files with multiple page breaks.

The one comment I have about this package is that using the term “mailmerge” evokes a very simple use case - populating multiple documents with mailing addresses. I know that the standard Word approach is to call this process a mailmerge but this “mailmerge” can be a useful templating system that can be used for a lot more sophisticated solution than just populating names and addresses in a document.

Installation

The package requires lxml which has platform specific binary installs. I recommend using conda to install lxml and the dependencies then using pip for the mailmerge package itself. I tested this on linux and Windows and seems to work fine on both platforms.

conda install lxml
pip install docx-mailmerge

That’s it. Before we show how to populate the Word fields, let’s walk through creating the Word document.

Word Merge Fields

In order for docx-mailmerge to work correctly, you need to create a standard Word document and define the appropriate merge fields. The examples below are for Word 2010. Other versions of Word should be similar. It actually took me a while to figure out this process but once you do it a couple of times, it is pretty simple.

Start Word and create the basic document structure. Then place the cursor in the location where the merged data should be inserted and choose Insert -> Quick Parts -> Field..:

Word Quick Parts

From the Field dialog box, select the “MergeField” option from the Field Names list. In the Field Name, enter the name you want for the field. In this case, we are using Business Name.

Word Add Field

Once you click ok, you should see something like this: <<Business Name>> in the Word document. You can go ahead and create the document with all the needed fields.

Simple Merge

Once you have the Word document created, merging the values is a simple operation. The code below contains the standard imports and defines the name of the Word file. In most cases, you will need to include the full path to the template but for simplicity, I am assuming it is in the same directory as your python scripts:

from __future__ import print_function
from mailmerge import MailMerge
from datetime import date

template = "Practical-Business-Python.docx"

To create a mailmerge document and look at all of the fields:

document = MailMerge(template)
print(document.get_merge_fields())
{'purchases', 'Business', 'address', 'discount', 'recipient', 'date', 'zip', 'status', 'phone_number', 'city', 'shipping_limit', 'state'}

To merge in the values and save the results, use document.merge with all of the variables assigned a value and document.write to save the output:

document.merge(
    status='Gold',
    city='Springfield',
    phone_number='800-555-5555',
    Business='Cool Shoes',
    zip='55555',
    purchases='$500,000',
    shipping_limit='$500',
    state='MO',
    address='1234 Main Street',
    date='{:%d-%b-%Y}'.format(date.today()),
    discount='5%',
    recipient='Mr. Jones')

document.write('test-output.docx')

Here is a sample of what the final document will look like:

Final Document

This is a simple document but pretty much anything you can do in Word can be turned into a template and populated in this manner.

Complex Merge

If you would like to replicate the results onto multiple pages, there is a shortcut called merge_pages which will take a list of dictionaries of key,value pairs and create multiple pages in a single file.

In a real world scenario you would pull the data from your master source (i.e. database, Excel, csv, etc.) and transform the data into the required dictionary format. For the purposes of keeping this simple, here are three customer dictionaries containing our output data:

cust_1 = {
    'status': 'Gold',
    'city': 'Springfield',
    'phone_number': '800-555-5555',
    'Business': 'Cool Shoes',
    'zip': '55555',
    'purchases': '$500,000',
    'shipping_limit': '$500',
    'state': 'MO',
    'address': '1234 Main Street',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '5%',
    'recipient': 'Mr. Jones'
}

cust_2 = {
    'status': 'Silver',
    'city': 'Columbus',
    'phone_number': '800-555-5551',
    'Business': 'Fancy Pants',
    'zip': '55551',
    'purchases': '$250,000',
    'shipping_limit': '$2000',
    'state': 'OH',
    'address': '1234 Elm St',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '2%',
    'recipient': 'Mrs. Smith'
}

cust_3 = {
    'status': 'Bronze',
    'city': 'Franklin',
    'phone_number': '800-555-5511',
    'Business': 'Tango Tops',
    'zip': '55511',
    'purchases': '$100,000',
    'shipping_limit': '$2500',
    'state': 'KY',
    'address': '1234 Adams St',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '2%',
    'recipient': 'Mr. Lincoln'
}

Creating a 3 page document is done by passing a list of dictionaries to the merge_pages function:

document.merge_pages([cust_1, cust_2, cust_3])
document.write('test-output-mult-custs.docx')

The output file is formatted and ready for printing or further editing.

Populating Tables

Another frequent need when generating templates is efficiently populating a table of values. In our example, we could attach an exhibit to the letter that includes the customer’s purchase history. When completing the template, we do not know how many rows to include and the challenge of naming each field would get overwhelming very quickly. Using merge_rows makes table population much easier.

To build out the template, create a standard Word table with 1 row and insert the fields in the appropriate columns. There is no special formatting required. It should look something like this:

Word Table Template

Next, we need to define a list of dictionaries for each item in the table.

sales_history = [{
    'prod_desc': 'Red Shoes',
    'price': '$10.00',
    'quantity': '2500',
    'total_purchases': '$25,000.00'
}, {
    'prod_desc': 'Green Shirt',
    'price': '$20.00',
    'quantity': '10000',
    'total_purchases': '$200,000.00'
}, {
    'prod_desc': 'Purple belt',
    'price': '$5.00',
    'quantity': '5000',
    'total_purchases': '$25,000.00'
}]

The keys in each dictionary correspond to the merge fields in the document. To build out the rows in the table:

document.merge(**cust_2)
document.merge_rows('prod_desc', sales_history)
document.write('test-output-table.docx')

In this example, we pass a dictionary to merge by passing the two ** . Python knows how to convert that into the key=value format that the function needs. The final step is to call merge_rows to build out the rows of the table.

The final result has each row populated with the values we need and preserves the default table formatting we defined in the template document:

Word Table

Full Code Example

In case the process was a little confusing, here is a full example showing all of the various approaches presented in this article. In addition, the template files can be downloaded from the github repo.

from __future__ import print_function
from mailmerge import MailMerge
from datetime import date

# Define the templates - assumes they are in the same directory as the code
template_1 = "Practical-Business-Python.docx"
template_2 = "Practical-Business-Python-History.docx"

# Show a simple example
document_1 = MailMerge(template_1)
print("Fields included in {}: {}".format(template_1,
                                         document_1.get_merge_fields()))

# Merge in the values
document_1.merge(
    status='Gold',
    city='Springfield',
    phone_number='800-555-5555',
    Business='Cool Shoes',
    zip='55555',
    purchases='$500,000',
    shipping_limit='$500',
    state='MO',
    address='1234 Main Street',
    date='{:%d-%b-%Y}'.format(date.today()),
    discount='5%',
    recipient='Mr. Jones')

# Save the document as example 1
document_1.write('example1.docx')

# Try example number two where we create multiple pages
# Define a dictionary for 3 customers
cust_1 = {
    'status': 'Gold',
    'city': 'Springfield',
    'phone_number': '800-555-5555',
    'Business': 'Cool Shoes',
    'zip': '55555',
    'purchases': '$500,000',
    'shipping_limit': '$500',
    'state': 'MO',
    'address': '1234 Main Street',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '5%',
    'recipient': 'Mr. Jones'
}

cust_2 = {
    'status': 'Silver',
    'city': 'Columbus',
    'phone_number': '800-555-5551',
    'Business': 'Fancy Pants',
    'zip': '55551',
    'purchases': '$250,000',
    'shipping_limit': '$2000',
    'state': 'OH',
    'address': '1234 Elm St',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '2%',
    'recipient': 'Mrs. Smith'
}

cust_3 = {
    'status': 'Bronze',
    'city': 'Franklin',
    'phone_number': '800-555-5511',
    'Business': 'Tango Tops',
    'zip': '55511',
    'purchases': '$100,000',
    'shipping_limit': '$2500',
    'state': 'KY',
    'address': '1234 Adams St',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '2%',
    'recipient': 'Mr. Lincoln'
}

document_2 = MailMerge(template_1)
document_2.merge_pages([cust_1, cust_2, cust_3])
document_2.write('example2.docx')

# Final Example includes a table with the sales history

sales_history = [{
    'prod_desc': 'Red Shoes',
    'price': '$10.00',
    'quantity': '2500',
    'total_purchases': '$25,000.00'
}, {
    'prod_desc': 'Green Shirt',
    'price': '$20.00',
    'quantity': '10000',
    'total_purchases': '$200,000.00'
}, {
    'prod_desc': 'Purple belt',
    'price': '$5.00',
    'quantity': '5000',
    'total_purchases': '$25,000.00'
}]

document_3 = MailMerge(template_2)
document_3.merge(**cust_2)
document_3.merge_rows('prod_desc', sales_history)
document_3.write('example3.docx')

Conclusion

I am always happy to find python-based solutions that will help me get away from using MS Office automation. I am generally more proficient with python and feel that the solutions are more portable. The docx-mailmerge library is one of those simple but powerful tools that I am sure I will use on many occasions in the future.

February 21, 2017 01:25 PM


Dataquest

Pandas Cheat Sheet - Python for Data Science

Pandas is arguably the most important Python package for data science. Not only does it give you lots of methods and functions that make working with data easier, but it has been optimized for speed which gives you a significant advantage compared with working with numeric data using Python’s built-in functions.

The printable version of this cheat sheet

It’s common when first learning pandas to have trouble remembering all the functions and methods that you need, and while at Dataquest we advocate getting used to consulting the pandas documentation, sometimes it’s nice to have a handy reference, so we’ve put together this cheat sheet to help you out!

If you’re interested in learning pandas, you can consult our two-part pandas tutorial blog post, or you can signup for free and start learning pandas through our interactive pandas for data science course.

Key and Imports

In this cheat sheet, we use the following shorthand:

df Any pandas DataFrame object
s Any pandas Series...

February 21, 2017 10:00 AM