skip to navigation
skip to content

Planet Python

Last update: August 27, 2016 09:49 PM

August 27, 2016


Podcast.__init__

Episode 72 - Dave Beazley

Summary

Dave Beazley has been using and teaching Python since the early days of the language. He has also been instrumental in spreading the gospel of asynchronous programming and the many ways that it can improve the performance of your programs. This week I had the pleasure of speaking with him about his history with the language and some of his favorite presentations and projects.

Brief Introduction

Linode Sponsor Banner

Use the promo code podcastinit20 to get a $20 credit when you sign up!

sentry-horizontal-black.png

Stop hoping your users will report bugs. Sentry’s real-time tracking gives you insight into production deployments and information to reproduce and fix crashes. Use the code podcastinit at signup to get a $50 credit!

Hired Logo

On Hired software engineers & designers can get 5+ interview requests in a week and each offer has salary and equity upfront. With full time and contract opportunities available, users can view the offers and accept or reject them before talking to any company. Work with over 2,500 companies from startups to large public companies hailing from 12 major tech hubs in North America and Europe. Hired is totally free for users and If you get a job you’ll get a $2,000 “thank you” bonus. If you use our special link to signup, then that bonus will double to $4,000 when you accept a job. If you’re not looking for a job but know someone who is, you can refer them to Hired and get a $1,337 bonus when they accept a job.

Interview with Dave Beazley

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Summary Dave Beazley has been using and teaching Python since the early days of the language. He has also been instrumental in spreading the gospel of asynchronous programming and the many ways that it can improve the performance of your programs. This week I had the pleasure of speaking with him about his history with the language and some of his favorite presentations and projects.Brief IntroductionHello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show you can visit our site at pythonpodcast.comLinode is sponsoring us this week. Check them out at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for your next projectWe are also sponsored by Sentry this week. Stop hoping your users will report bugs. Sentry's real-time tracking gives you insight into production deployments and information to reproduce and fix crashes. Check them out at getsentry.com and use the code podcastinit at signup to get a $50 credit!Hired has also returned as a sponsor this week. If you're looking for a job as a developer or designer then Hired will bring the opportunities to you. Sign up at hired.com/podcastinit to double your signing bonus.Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workersJoin our community! Visit discourse.pythonpodcast.com for your opportunity to find out about upcoming guests, suggest questions, and propose show ideas.Your hosts as usual are Tobias Macey and Chris PattiToday we're interviewing Dave Beazley about his career with Python Use the promo code podcastinit20 to get a $20 credit when you sign up! Stop hoping your users will report bugs. Sentry's real-time tracking gives you insight into production deployments and information to reproduce and fix crashes. Use the code podcastinit at signup to get a $50 credit! On Hired software engineers designers can get 5+ interview requests in a week and each offer has salary and equity upfront. With full time and contract opportunities available, users can view the offers and accept or reject them before talking to any company. Work with over 2,500 companies from startups to large public companies hailing from 12 major tech hubs in North America and Europe. Hired is totally free for users and If you get a job you’ll get a $2,000 “thank you” bonus. If you use our special link to signup, then that bonus will double to $4,000 when you accept a job. If you’re not looking for a job but know someone who is, you can refer them to Hired and get a $1,337 bonus when they accept a job.Interview with Dave BeazleyIntroductionsHow did you get introduced to Python? - TobiasHow has Python and its community helped to shape your career? - TobiasWhat are some of the major themes that you have focused on in your work? - TobiasOne of the things that you are known for is doing live-coding presentations, many of which are fairly advanced. What is it about that format that appeals to you? - TobiasWhat are some of your favorite stories about a presentation that didn't quite go as planned? - TobiasYou have given a large number of talks at various conferences. What are some of your favorites? - TobiasWhat impact do you think that asynchronous programming will have on the future of the Python language and ecosystem? - TobiasAre there any features that you see in other languages that you would like to have incorporated in Python? - TobiasOn the about page for your website you talk about some of the low-level code and hardware knowledge that you picked up by working with computers as a kid. Do you think that people who are getting started with programming now are missing out by not get

August 27, 2016 08:47 PM


Wesley Chun

Google APIs: migrating from tools.run() to tools.run_flow()

Got AttributeError? As in: AttributeError: 'module' object has no attribute 'run'? Rename run() to run_flow(), and you'll be good-to-go. TL;DR: This mini-tutorial slash migration guide slash PSA (public service announcement) is aimed at Python developers using the Google APIs Client Library (to access Google APIs from their applications) currently calling oauth2client.tools.run() and likely getting an exception (see Jan 2016 update below), and need to oauth2client.tools.run_flow(), its replacement. 

UPDATE (Aug 2016): The flags parameter in run_flow() function became optional in Feb 2016, so tweaked the blogpost to reflect that.

UPDATE (Jun 2016): Revised the code and cleaned up the dialog so there are no longer any instances of using run() function, significantly shortening this post.

UPDATE (Jan 2016): The tools.run() function itself was forcibly removed (without a fallback) in Aug 2015, so if you're using any release on or after that, any such calls from your code will throw an exception (AttributeError: 'module' object has no attribute 'run'). To fix this problem, continue reading.

Prelude

We're going to continue our look at accessing Google APIs from Python. In addition to the previous pair of posts (http://goo.gl/57Gufk and http://goo.gl/cdm3kZ), as part of my day job, I've been working on corresponding video content, some of which are tied specifically to posts on this blog.

In this follow-up, we're going to specifically address the sidebar in the previous post, where we bookmarked an item for future discussion where the future is now: in the oauth2client package, tools.run() has been deprecated by tools.run_flow(). Note you need at least Python 2.7 or 3.3 to use the Google APIs Client Library. (If you didn't even know Python 3 was supported at all, then you need to see this post and this Quora Q&A.)

Replacing tools.run() with tools.run_flow()

Now let's convert the authorized access to Google APIs code from using tools.run() to tools.run_flow(). Here is the old snippet I'm talking about that needs upgrading:
from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = # one or more scopes (str or iterable)
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
creds = tools.run(flow, store)

SERVICE = discovery.build(API, VERSION, http=creds.authorize(Http()))
If you're using the latest Client Library (as of Feb 2016), all you need to do is change the tools.run() call to tools.run_flow(), as italicized below. Everything else stays exactly the same:
from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = # one or more scopes (str or iterable)
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
creds = tools.run_flow(flow, store)
If you don't have the latest Client Library, then your update involves the extra steps of adding lines that import argparse and using it to get the flags argument needed by tools.run_flow() plus the actual change from tools.run(); all updates italicized below:
import argparse

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = # one or more scopes (str or iterable)
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
creds = tools.run_flow(flow, store, flags)

SERVICE = discovery.build(API, VERSION, http=creds.authorize(Http()))

Command-line argument processing, or "Why argparse?"

Python has had several modules in the Standard Library that allow developers to process command-line arguments. The original one was getopt which mirrored the getopt() function from C. In Python 2.3, optparse was introduced, featuring more powerful processing capabilities. However, it was deprecated in 2.7 in favor of a similar module, argparse. (To find out more about their similarities, differences and rationale behind developing argparse , see PEP 389 and this argparse docs page.) For the purposes of using Google APIs, you're all set if using Python 2.7 as it's included in the Standard Library. Otherwise Python 2.3-2.6 users can install it with: "pip install -U argparse". 

Irregardless of whether you need argparse, once you migrate to either snippet with tools.run_flow(), your application should go back to working the way it had before.

August 27, 2016 11:23 AM


Weekly Python StackOverflow Report

(xxxiv) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2016-08-27 07:54:02 GMT


  1. Can a line of Python code know its indentation nesting level? - [52/4]
  2. Better way to swap elements in list? - [21/11]
  3. Imported a Python module; why does a reassigning a member in it also affect an import elsewhere? - [17/5]
  4. How to get a python script to invoke "python -i" when called normally? - [17/5]
  5. NumPy performance: uint8 vs. float and multiplication vs. division? - [12/3]
  6. Updating a list within a tuple - [10/1]
  7. Matching Unicode word boundaries in Python - [10/1]
  8. How do I release memory used by a pandas dataframe? - [9/2]
  9. Why does printing a dataframe break python when constructed from numpy empty_like - [9/1]
  10. Performance between C-contiguous and Fortran-contiguous array operations - [7/2]

August 27, 2016 07:54 AM

August 26, 2016


Péter Szabó

Binary search in line-sorted text files

This blog post announces pts-line-bisect, a C program and an equivalent Python script and library for doing binary seach in line-sorted text files, and it also explains some of the design details of the Python implementation.

Let's suppose you have a sorted text file, in which each line is lexicographically larger than the previous one, and you want to find a specific range of lines, or all lines with a specific prefix.

I've written the program pts-line-bisectfor that recently. See the beginning of the READMEon many examples how to use the C program. The Python program has fewer features, here is how to use it:

Please note that these tools support duplicate lines correctly (i.e. if the same line appears multiple times, in a block).

Please note that these tools and libraries assume that the key is at the beginning of the line. If you have a sorted file whose lines contain the sort key somewhere in the middle, you need to make changes to the code. It's possible to do so in a way that the code will still support duplicate keys (i.e. records with the same key).

Analysis

I've written the article titled Evolution of a binary search implementation for line-sorted text files about the topic, containing the problem statement, the possible pitfalls, an analysis of some incorrect solutions available on the web as code example, a detailed specification and explanation of my solution (including a proof), disk seek and speed analysis, a set of speedup ideas and their implementation, and further notes about the speedups in the C implementation.

As a teaser, here is an incorrect solution in Python:

def bisect_left_incorrect(f, x):
"""... Warning: Incorrect implementation with corner case bugs!"""
x = x.rstrip('\n')
f.seek(0, 2) # Seek to EOF.
lo, hi = 0, f.tell()
while lo < hi:
mid = (lo + hi) >> 1
f.seek(mid)
f.readline() # Ignore previous line, find our line.
if x <= f.readline().rstrip('\n'):
hi = mid
else:
lo = mid + 1
return lo

Can you spot the all the 3 bugs?

Read the article for all the details and the solution.

About sorting: For the above implementation to work, files must be sorted lexicographically, using the unsigned value (0..255) for each byte in the line. If you don't sort the file, or you sort it using some language collation or some locale, then the linked implementation won't find the results you are looking for. On Linux, use LC_ALL=C sort <in.txt >out.txt to sort lexicographically.

As a reference, here is the correct implementation of the same algorithm for finding the start of the interval in a sorted list or other sequences (based on the bisect module):

def bisect_left(a, x, lo=0, hi=None):
"""Return the index where to insert item x in list a, assuming a is sorted.

The return value i is such that all e in a[:i] have e < x, and all e in
a[i:] have e >= x. So if x already appears in the list, a.insert(x) will
insert just before the leftmost x already there.

Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo + hi) >> 1
if x <= a[mid]: # Change `<=' to `<', and you get bisect_right.
hi = mid
else:
lo = mid + 1
return lo

A typical real-word use case for such a binary search tool is retrieving lines corresponding to a particular time range in log files (or time based measurement records). These files text files with variable-length lines, with the log timestamp in the beginning of the line, and they are generated in increasing timestamp order. Unfortunately the lines are not lexicographically sorted, so the timestamp has to be decoded first for the comparison. The bsearch tool does that, it also supports parsing arbitrary, user-specifiable datetime formats, and it can binary search in gzip(1)ped files as well (by building an index). It's also of high performance and low overhead, partially because it is written in C++. So bsearch is practical tool with lots of useful features. If you need anything more complicated than a lexicographic binary search, use it instead.

Before getting too excited about binary search, please note that there are much faster alternatives for data on disk. In an external (file-based) binary search the slowest operation is the disk seek needed for each bisection. Most of the software and hardware components would be waiting for the hard disk to move the reading head. (Except, of course, that seek times are negligible when non-spinning storage hardware such as SSD or memory card is used.)

An out-of-the box solution would be adding the data to more disk-efficient key-value store. There are several programs providing such stores. Most of them are based on a B-tree, B*-tree or B+-tree data structure if sorted iteration and range searches have to be supported, or disk-based hashtables otherwise. Some of the high-performance single-machine key-value stores: cdb (read-only), Tokyo Cabinet, Kyoto Cabinet, LevelDB; see more in the NoSQL software list.

The fundamental speed difference between a B-tree search and a binary search in a sorted list stems from the fact that B-trees have a branching factor larger than 2 (possibly 100s or 1000s), thus each seeking step in a B-tree search reduces possible input size by a factor larger than 2, while in a binary search each step reduces the the input size by a factor 2 only (i.e. we keep either the bottom half or the top half). So both kinds of searches are logarithmic, but the base of the logarithm is different, and this causes a constant factor difference in the disk seek count. By careful tunig of the constant in B-trees it's usual to have only 2 or 3 disk seeks for each search even for 100GB of data, while a binary search in a such a large file with 50 bytes per record would need 31 disk seeks. By taking 10ms as the seek time (see more info about typical hard disk seek times), a typical B-tree search takes 0.03 second, and a typical binary search takes 0.31 second.

Have fun binary searching, but don't forget to sort your files first!

August 26, 2016 04:43 PM


Wesley Chun

Authorized Google API access from Python (part 2 of 2)

NOTE: You can also watch a video walkthrough of the common code covered in this blogpost here.

UPDATE (Aug 2016): The code has been modernized to use oauth2client.tools.run_flow() instead of the deprecated oauth2client.tools.run_flow(). You can read more about that change here.

UPDATE (Jun 2016): Updated to Python 2.7 & 3.3+ and Drive API v3.

Introduction

In this final installment of a (currently) two-part series introducing Python developers to building on Google APIs, we'll extend from the simple API example from the first post (part 1) just over a month ago. Those first snippets showed some skeleton code and a short real working sample that demonstrate accessing a public (Google) API with an API key (that queried public Google+ posts). An API key however, does not grant applications access to authorized data.

Authorized data, including user information such as personal files on Google Drive and YouTube playlists, require additional security steps before access is granted. Sharing of and hardcoding credentials such as usernames and passwords is not only insecure, it's also a thing of the past. A more modern approach leverages token exchange, authenticated API calls, and standards such as OAuth2.

In this post, we'll demonstrate how to use Python to access authorized Google APIs using OAuth2, specifically listing the files (and folders) in your Google Drive. In order to better understand the example, we strongly recommend you check out the OAuth2 guides (general OAuth2 info, OAuth2 as it relates to Python and its client library) in the documentation to get started.

The docs describe the OAuth2 flow: making a request for authorized access, having the user grant access to your app, and obtaining a(n access) token with which to sign and make authorized API calls with. The steps you need to take to get started begin nearly the same way as for simple API access. The process diverges when you arrive on the Credentials page when following the steps below.

Google API access

In order to Google API authorized access, follow these instructions (the first three of which are roughly the same for simple API access):
NOTEs: Instructions from the previous blogpost were to get an API key. This time, in the steps above, we're creating and downloading OAuth2 credentials. You can also watch a video walkthrough of this app setup process of getting simple or authorized access credentials in the "DevConsole" here.

Accessing Google APIs from Python

In order to access authorized Google APIs from Python, you still need the Google APIs Client Library for Python, so in this case, do follow those installation instructions from part 1.

We will again use the apiclient.discovery.build() function, which is what we need to create a service endpoint for interacting with an API, authorized or otherwise. However, for authorized data access, we need additional resources, namely the httplib2 and oauth2client packages. Here are the first five lines of the new boilerplate code for authorized access:

from __future__ import print_function
from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = # one or more scopes (strings)
SCOPES is a critical variable: it represents the set of scopes of authorization an app wants to obtain (then access) on behalf of user(s). What's does a scope look like?

Each scope is a single character string, specifically a URL. Here are some examples:
You can request one or more scopes, given as a single space-delimited string of scopes or an iterable (list, generator expression, etc.) of strings.  If you were writing an app that accesses both your YouTube playlists as well as your Google+ profile information, your SCOPES variable could be either of the following:
SCOPES = 'https://www.googleapis.com/auth/plus.me https://www.googleapis.com/auth/youtube'

That is space-delimited and made tiny by me so it doesn't wrap in a regular-sized browser window; or it could be an easier-to-read, non-tiny, and non-wrapped tuple:

SCOPES = (
    'https://www.googleapis.com/auth/plus.me',
    'https://www.googleapis.com/auth/youtube',
)

Our example command-line script will just list the files on your Google Drive, so we only need the read-only Drive metadata scope, meaning our SCOPES variable will be just this:
SCOPES = 'https://www.googleapis.com/auth/drive.metadata.readonly'
The next section of boilerplate represents the security code:
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
    creds = tools.run_flow(flow, store)
Once the user has authorized access to their personal data by your app, a special "access token" is given to your app. This precious resource must be stored somewhere local for the app to use. In our case, we'll store it in a file called "storage.json". The lines setting the store and creds variables are attempting to get a valid access token with which to make an authorized API call.

If the credentials are missing or invalid, such as being expired, the authorization flow (using the client secret you downloaded along with a set of requested scopes) must be created (by client.flow_from_clientsecrets()) and executed (by tools.run_flow()) to ensure possession of valid credentials. The client_id.json or client_secret.json file is the credentials file you saved when you clicked "Download JSON" from the DevConsole after you've created your OAuth2 client ID.

If you don't have credentials at all, the user much explicitly grant permission — I'm sure you've all seen the OAuth2 dialog describing the type of access an app is requesting (remember those scopes?). Once the user clicks "Accept" to grant permission, a valid access token is returned and saved into the storage file (because you passed a handle to it when you called tools.run_flow()).

Note: tools.run() deprecated by tools.run_flow()
You may have seen usage of the older tools.run() function, but it has been deprecated by tools.run_flow(). We explain this in more detail in another blogpost specifically geared towards migration.

Once the user grants access and valid credentials are saved, you can create one or more endpoints to the secure service(s) desired with apiclient.discovery.build(), just like with simple API access. Its call will look slightly different, mainly that you need to sign your HTTP requests with your credentials rather than passing an API key:

DRIVE = discovery.build(API, VERSION, http=creds.authorize(Http()))

In our example, we're going to list your files and folders in your Google Drive, so for API, use the string 'drive'. The API is currently on version 3 so use 'v3' for VERSION:

DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))

If you want to get comfortable with OAuth2, what it's flow is and how it works, we recommend that you experiment at the OAuth Playground. There you can choose from any number of APIs to access and experience first-hand how your app must be authorized to access personal data.

Going back to our working example, once you have an established service endpoint, you can use the list() method of the files service to request the file data:

files = DRIVE.files().list().execute().get('files', [])

If there's any data to read, the response dict will contain an iterable of files that we can loop over (or default to an empty list so the loop doesn't fail), displaying file names and types:

for f in files:
    print(f['name'], f['mimeType'])

Conclusion

To find out more about the input parameters as well as all the fields that are in the response, take a look at the docs for files().list(). For more information on what other operations you can execute with the Google Drive API, take a look at the reference docs and check out the companion video for this code sample. That's it!

Below is the entire script for your convenience:
'''
drive_list.py -- Google Drive API authorized demo
updated Aug 2016 by +WesleyChun/@wescpy
'''
from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = 'https://www.googleapis.com/auth/drive.readonly.metadata'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
    creds = tools.run_flow(flow, store)

DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))
files = DRIVE.files().list().execute().get('files', [])
for f in files:
    print(f['name'], f['mimeType'])
When you run it, you should see pretty much what you'd expect, a list of file or folder names followed by their MIMEtypes — I named my script drive_list.py:
$ python3 drive_list.py
Google Maps demo application/vnd.google-apps.spreadsheet
Overview of Google APIs - Sep 2014 application/vnd.google-apps.presentation
tiresResearch.xls application/vnd.google-apps.spreadsheet
6451_Core_Python_Schedule.doc application/vnd.google-apps.document
out1.txt application/vnd.google-apps.document
tiresResearch.xls application/vnd.ms-excel
6451_Core_Python_Schedule.doc application/msword
out1.txt text/plain
Maps and Sheets demo application/vnd.google-apps.spreadsheet
ProtoRPC Getting Started Guide application/vnd.google-apps.document
gtaskqueue-1.0.2_public.tar.gz application/x-gzip
Pull Queues application/vnd.google-apps.folder
gtaskqueue-1.0.1_public.tar.gz application/x-gzip
appengine-java-sdk.zip application/zip
taskqueue.py text/x-python-script
Google Apps Security Whitepaper 06/10/2010.pdf application/pdf
Obviously your output will be different, depending on what files are in your Google Drive. But that's it... hope this is useful. You can now customize this code for your own needs and/or to access other Google APIs. Thanks for reading!

EXTRA CREDIT: To test your skills, add functionality to this code that also displays the last modified timestamp, the file (byte)size, and perhaps shave the MIMEtype a bit as it's slightly harder to read in its entirety... perhaps take just the final path element? One last challenge: in the output above, we have both Microsoft Office documents as well as their auto-converted versions for Google Apps... perhaps only show the filename once and have a double-entry for the filetypes!

August 26, 2016 01:29 PM

Simple Google API access from Python (part 1 of 2)

NOTE: You can also watch a video walkthrough of the common code covered in this blogpost here.

UPDATE (Aug 2016): The code has been modernized to recognize that the Client Library is available for Python 2 or 3.

Introduction

Back in 2012 when I published Core Python Applications Programming, 3rd ed., I
posted about how I integrated Google technologies into the book. The only problem is that I presented very specific code for Google App Engine and Google+ only. I didn't show a generic way how, using pretty much the same boilerplate Python snippet, you can access any number of Google APIs; so here we are.

In this multi-part series, I'll break down the code that allows you to leverage Google APIs to the most basic level (even for Python), so you can customize as necessary for your app, whether it's running as a command-line tool or something server-side in the cloud backending Web or mobile clients. If you've got the book and played around with our Google+ API example, you'll find this code familiar, if not identical — I'll go into more detail here, highlighting the common code for generic API access and then bring in the G+-relevant code later.

We'll start in this first post by demonstrating how to access public or unauthorized data from Google APIs. (The next post will illustrate how to access authorized data from Google APIs.) Regardless of which you use, the corresponding boilerplate code stands alone. In fact, it's probably best if you saved these generic snippets in a library module so you can (re)use the same bits for any number of apps which access any number of modern Google APIs.

Google API access

In order to access Google APIs, follow these instructions:
NOTE: You can also watch a video walkthrough of this app setup process in the "DevConsole" here.

Accessing Google APIs from Python

Now that you're set up, everything else is done on the Python side. To talk to a Google API, you need the Google APIs Client Library for Python, specifically the apiclient.discovery.build() function. Download and install the library in your usual way, for example:

$ pip install -U google-api-python-client  # or pip3 for 3.x
NOTE: If you're building a Python App Engine app, you'll need something else, the Google APIs Client Library for Python on Google App Engine. It's similar but has extra goodies (specifically decorators — brief generic intro to those in my previous post) just for cloud developers that must be installed elsewhere. As App Engine developers know, libraries must be in the same location on the filesystem as your source code.
Once everything is installed, make sure that you can import apiclient.discovery:

$ python
Python 2.7.6 (default, Apr  9 2014, 11:48:52)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import apiclient.discovery
>>>

In discovery.py is the build() function, which is what we need to create a service endpoint for interacting with an API. Now craft the following lines of code in your command-line tool, using the shorthand from-import statement instead:

from apiclient import discovery

API_KEY = # copied from project credentials page
SERVICEdiscovery.build(API, VERSION, developerKey=API_KEY)

Take the API key you copied from the credentials page and assign to the API_KEY variable as a string. Obviously, embedding an API key in source code isn't something you'd so in practice as it's not secure whatsoever — stick it in a database, key broker, encrypt, or at least have it in a separate byte code (.pyc/.pyo) file that you import — but we'll allow it now solely for illustrative purposes of a simple command-line script.

In our short example we're going to do a simple search for "python" in public Google+ posts, so for the API variable, use the string 'plus'. The API version is currently on version 1 (at the time of this writing), so use 'v1' for VERSION. (Each API will use a different name and version string... again, you can find those in the OAuth Playground or in the docs for the specific API you want to use.) Here's the call once we've filled in those variables:

GPLUS = discovery.build('plus', 'v1', developerKey=API_KEY)

We need a template for the results that come back. There are many fields in a Google+ post, so we're only going to pick three to display... the user name, post timestamp, and a snippet of the post itself:

TMPL = '''
    User: %s
    Date: %s
    Post: %s
'''

Now for the code. Google+ posts are activities (known as "notes;" there are other activities as well). One of the methods you have access to is search(), which lets you query public activities; so that's what we're going to use. Add the following call using the GPLUS service endpoint you already created using the verbs we just described and execute it:

items = GPLUS.activities().search(query='python').execute().get('items', [])

If all goes well, the (JSON) response payload will contain a set of 'items' (else we assign an empty list for the for loop). From there, we'll loop through each matching post, do some minor string manipulation to replace all whitespace characters (including NEWLINEs [ \n ]) with spaces, and display if not blank:

for data in items:
    post = ' '.join(data['title'].strip().split())
    if post:
        print(TMPL % (data['actor']['displayName'],
                      data['published'], post))


Conclusion

To find out more about the input parameters as well as all the fields that are in the response, take a look at the docs. Below is the entire script missing only the API_KEY which you'll have to fill in yourself.

from __future__ import print_function
from apiclient import discovery

TMPL = '''
User: %s
Date: %s
Post: %s
'''

API_KEY = # copied from project credentials page
GPLUS = discovery.build('plus', 'v1', developerKey=API_KEY)
items = GPLUS.activities().search(query='python').execute().get('items', [])
for data in items:
post = ' '.join(data['title'].strip().split())
if post:
print(TMPL % (data['actor']['displayName'],
data['published'], post))

When you run it, you should see pretty much what you'd expect, a few posts on Python, some on Monty Python, and of course, some on the snake — I called my script plus_search.py:

$ python plus_search.py # or python3

User: Jeff Ward
Date: 2014-09-20T18:08:23.058Z
Post: How to make python accessible in the command window.


User: Fayland Lam
Date: 2014-09-20T16:40:11.512Z
Post: Data Engineer http://findmjob.com/job/AB7ZKitA5BGYyW1oAlQ0Fw/Data-Engineer.html #python #hadoop #jobs...


User: Willy's Emporium LTD
Date: 2014-09-20T16:19:33.851Z
Post: MONTY PYTHON QUOTES MUG Take a swig to wash down all that albatross and crunchy frog. Featuring 20 ...


User: Doddy Pal
Date: 2014-09-20T15:49:54.405Z
Post: Classic Monty Python!!!


User: Sebastian Huskins
Date: 2014-09-20T15:33:00.707Z
Post: Made a small python script to get shellcode out of an executable. I found a nice commandlinefu.com oneline...

EXTRA CREDIT: To test your skills, check the docs and add a fourth line to each output which is the URL/link to that specific post, so that you (and your users) can open a browser to it if of interest.

If you want to build on from here, check out the larger app using the Google+ API featured in Chapter 15 of the book — it adds some brains to this basic code where the Google+ posts are sorted by popularity using a "chatter" score. That just about wraps it up this post. Once you're good to go, then you're ready to learn how to perform authorized Google API access in part 2 of this two-part series!

August 26, 2016 12:44 PM

Accessing Gmail from Python (plus BONUS)

NOTE: The code covered in this blogpost is also available in a video walkthrough here.

UPDATE (Aug 2016): The code has been modernized to use oauth2client.tools.run_flow() instead of the deprecated oauth2client.tools.run_flow(). You can read more about that change here.

Introduction

The last several posts have illustrated how to connect to public/simple and authorized Google APIs. Today, we're going to demonstrate accessing the Gmail (another authorized) API. Yes, you read that correctly... "API." In the old days, you access mail services with standard Internet protocols such as IMAP/POP and SMTP. However, while they are standards, they haven't kept up with modern day email usage and developers' needs that go along with it. In comes the Gmail API which provides CRUD access to email threads and drafts along with messages, search queries, management of labels (like folders), and domain administration features that are an extra concern for enterprise developers.

Earlier posts demonstrate the structure and "how-to" use Google APIs in general, so the most recent posts, including this one, focus on solutions and apps, and use of specific APIs. Once you review the earlier material, you're ready to start with Gmail scopes then see how to use the API itself.

Gmail API Scopes

Below are the Gmail API scopes of authorization. We're listing them in most-to-least restrictive order because that's the order you should consider using them in  use the most restrictive scope you possibly can yet still allowing your app to do its work. This makes your app more secure and may prevent inadvertently going over any quotas, or accessing, destroying, or corrupting data. Also, users are less hesitant to install your app if it asks only for more restricted access to their inboxes.

Using the Gmail API

We're going to create a sample Python script that goes through your Gmail threads and looks for those which have more than 2 messages, for example, if you're seeking particularly chatty threads on mailing lists you're subscribed to. Since we're only peeking at inbox content, the only scope we'll request is 'gmail.readonly', the most restrictive scope. The API string is 'gmail' which is currently on version 1, so here's the call to apiclient.discovery.build() you'll use:

GMAIL = discovery.build('gmail', 'v1', http=creds.authorize(Http()))

Note that all lines of code above that is predominantly boilerplate (that was explained in earlier posts). Anyway, once you have an established service endpoint with build(), you can use the list() method of the threads service to request the file data. The one required parameter is the user's Gmail address. A special value of 'me' has been set aside for the currently authenticated user.
threads = GMAIL.users().threads().list(userId='me').execute().get('threads', [])
If all goes well, the (JSON) response payload will (not be empty or missing and) contain a sequence of threads that we can loop over. For each thread, we need to fetch more info, so we issue a second API call for that. Specifically, we care about the number of messages in a thread:
for thread in threads:
tdata = GMAIL.users().threads().get(userId='me', id=thread['id']).execute()
nmsgs = len(tdata['messages'])
We're seeking only all threads more than 2 (that means at least 3) messages, discarding the rest. If a thread meets that criteria, scan the first message and cycle through the email headers looking for the "Subject" line to display to users, skipping the remaining headers as soon as we find one:
    if nmsgs > 2:
msg = tdata['messages'][0]['payload']
subject = ''
for header in msg['headers']:
if header['name'] == 'Subject':
subject = header['value']
break
if subject:
print('%s (%d msgs)' % (subject, nmsgs))
If you're on many mailing lists, this may give you more messages than desired, so feel free to up the threshold from 2 to 50, 100, or whatever makes sense for you. (In that case, you should use a variable.) Regardless, that's pretty much the entire script save for the OAuth2 code that we're so familiar with from previous posts. The script is posted below in its entirety, and if you run it, you'll see an interesting collection of threads... YMMV depending on what messages are in your inbox:
$ python3 gmail_threads.py
[Tutor] About Python Module to Process Bytes (3 msgs)
Core Python book review update (30 msgs)
[Tutor] scratching my head (16 msgs)
[Tutor] for loop for long numbers (10 msgs)
[Tutor] How to show the listbox from sqlite and make it searchable? (4 msgs)
[Tutor] find pickle and retrieve saved data (3 msgs)

BONUS: Python 3!

As of Mar 2015 (formally in Apr 2015 when the docs were updated), support for Python 3 was added to Google APIs Client Library (3.3+)! This update was a long time coming (relevant GitHub thread), and allows Python 3 developers to write code that accesses Google APIs. If you're already running 3.x, you can use its pip command (pip3) to install the Client Library:

$ pip3 install -U google-api-python-client

Because of this, unlike previous blogposts, we're deliberately going to avoid use of the print statement and switch to the print() function instead. If you're still running Python 2, be sure to add the following import so that the code will also run in your 2.x interpreter:

from __future__ import print_function

Conclusion

To find out more about the input parameters as well as all the fields that are in the response, take a look at the docs for threads().list(). For more information on what other operations you can execute with the Gmail API, take a look at the reference docs and check out the companion video for this code sample. That's it!

Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!):
#!/usr/bin/env python

from __future__ import print_function
from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = 'https://www.googleapis.com/auth/gmail.readonly'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
creds = tools.run_flow(flow, store)
GMAIL = discovery.build('gmail', 'v1', http=creds.authorize(Http()))
threads = GMAIL.users().threads().list(userId='me').execute().get('threads', [])
for thread in threads:
tdata = GMAIL.users().threads().get(userId='me', id=thread['id']).execute()
nmsgs = len(tdata['messages'])

if nmsgs > 2:
msg = tdata['messages'][0]['payload']
subject = ''
for header in msg['headers']:
if header['name'] == 'Subject':
subject = header['value']
break
if subject:
print('%s (%d msgs)' % (subject, nmsgs))

You can now customize this code for your own needs, for a mobile frontend, a server-side backend, or to access other Google APIs. If you want to see another example of using the Gmail API (displaying all your inbox labels), check out the Python Quickstart example in the official docs or its equivalent in Java (server-side, Android), iOS (Objective-C, Swift), C#/.NET, PHP, Ruby, JavaScript (client-side, Node.js), or Go. That's it... hope you find these code samples useful in helping you get started with the Gmail API!

EXTRA CREDIT: To test your skills and challenge yourself, try writing code that allows users to perform a search across their email, or perhaps creating an email draft, adding attachments, then sending them! Note that to prevent spam, there are strict Program Policies that you must abide with... any abuse could rate limit your account or get it shut down. Check out those rules plus other Gmail terms of use here.

August 26, 2016 12:22 PM


Talk Python to Me

#73 Machine learning at the new Microsoft

In this episode we catch up with David Crook, a developer evangelist at Microsoft. He is a co-organizer for the Fort Lauderdale Machine Learning User Group and is involved in many more user groups and meetups. You hear about some really cool projects where they are using Python and TensorFlow to work on simple things like growing more food to help feed the world. <br/> <br/> Links from the show: <br/> <div style="font-size: .85em;"> <br/> <b>David on Twitter</b>: <a href='https://twitter.com/data4bots' target='_blank'>@data4bots</a> <br/> <b> David on the web</b>: <a href='http://dacrook.com/' target='_blank'>dacrook.com/</a> <br/> <b> Fort Lauderdale machine learning UG</b>: <br/> <a href='https://www.meetup.com/Fort-Lauderdale-Machine-Learning-Meetup/' target='_blank'>meetup.com/Fort-Lauderdale-Machine-Learning-Meetup</a> <br/> <b>Azure machine learning</b>: <a href='https://azure.microsoft.com/en-us/services/machine-learning/' target='_blank'>azure.microsoft.com/en-us/services/machine-learning</a> <br/> <b>TensoFlow</b>: <a href='https://www.tensorflow.org/' target='_blank'>tensorflow.org</a> <br/> </div>

August 26, 2016 08:00 AM


S. Lott

On Generator Functions, Yield and Return

Here's the question, lightly edited to remove the garbage. (Sometimes I'm charitable and call it "rambling". Today, I'm not feeling charitable about the garbage writing style filled with strange assumptions instead of questions.)

someone asked if you could have both a yield and a return in the same ... function/iterator. There was debate and the senior people said, let's actually write code. They wrote code and proved that couldn't have both a yield and a return in the same ... function/iterator. .... 
The meeting moved on w/out anyone asking the why question. Why doesn't it make sense to have both a yield and a return. ...

The impact of the yield statement can be confusing. Writing code to mess around with it was somehow unhelpful. And the shocking "proved that couldn't have both a yield and a return in the same ... function" is a serious problem.

(Or a seriously incorrect summary of the conversation; a very real possibility considering the garbage-encrusted email. Or a sign that Python 3 isn't widely-enough used and the emil omitted this essential fact. And yes, I'm being overly sensitive to the garbage. But there's a better way to come to grips with reality and it involves asking questions and parsing details instead of repeating assumptions and writing garbage.)

An example


>>> def silly(n, stop=None):
for i in range(n):
if i == stop: return
yield i


>>> list(silly(5))
[0, 1, 2, 3, 4]
>>> list(silly(5, stop=3))
[0, 1, 2]

This works in both Python 3.5.1 and 2.7.10.

Some discussion

A definition with no yield is a conventional function: the parameters from some domain are mapped to a return value in some range. Each mapping is a single evaluation of the function with concrete argument values.

A definition with a yield statement becomes an iterable generator of (potentially) multiple values. The return statement changes its behavior slightly. It no longer defines the one (and only) return value. In a generator function (one that has a yield) the return statement can be thought of as if it raised the StopIteration exception as a way to exit from the generator.

As can be seen in the example above, both statements are in one function. They both work to provide expected semantics.

The code which gets an error is this:

>>> def silly(n, stop=3):
... for i in range(n):
... if i == step: return "boom!"
... yield i


The "why?" question is should -- perhaps -- be obvious at this point.  The return raises an exception; it doesn't provide a value.

The topic, however, remains troubling. The phrase "have both a yield and a return" is bothersome because it fails to recognize that the yield statement has a special role. The yield statement transforms the semantics of the function to make it into a different object with similar syntax.

It's not a matter of having them "both". It's matter of having a return in a generator. This is an entirely separate and trivial-to-answer question.

A Long Useless Rant

The email seems to contain an implicit assumption. It's the notion that programming language semantics are subtle and slippery things. And even "senior people" can't get it right. Because all programming languages (other then the email sender's personal favorite) are inherently confusing. The confusion cannot be avoided.

There are times when programming language semantics are confusing.  For example, the ++ operator in C is confusing. Nothing can be done about that. The original definition was tied to the PDP-11 machine instructions. Since then... Well.... Aspects of the generated code are formally undefined.  Many languages have one or more places where the semantics are "undefined" or only defined by example.

This is not one of those times.

Here's the real problem I have with the garbage aspect of the email.

If you bring personal baggage to the conversation -- i.e., assumptions based on a comparison between some other language and Python -- confusion will erupt all over the place. Languages are different. Concepts don't map from language to language very well. Yes, there are simple abstract principles which have different concrete realizations in different languages. But among the various concrete realizations, there may not be a simple mapping.

It's essential to discard all knowledge of all previous favorite programming languages when learning a new language.

I'll repeat that for the author of the email.

Don't Go To The Well With A Full Bucket.

You won't get anything.

In this specific case, the notion of "function" in Python is expanded to include two superficially similar things. The syntax is nearly identical. But the behaviors are remarkably different. It's essential to grasp the idea that the two things are different, and can't be casually lumped together as "function/iterator".

The crux of the email appears to be a failure to get the Python language rules in a profound way. 

August 26, 2016 07:22 AM


Vasudev Ram

Square spiral - drawing with 3D effect (turtle graphics)

By Vasudev Ram

I was doing some work with Python turtle graphics for a project, and came up with this simple program that draws a square-ish spiral in multiple colors. It has a bit of a 3D effect. You can see it as a pyramid with you above, the levels ascending toward you, or you can see it (again from above) as a well with steps, going downward.

Here is the code and a screenshot of its output:

'''
square_spiral.py
A program that draws a "square spiral".
Author: Vasudev Ram
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
'''

import turtle
t = turtle

colors = ['blue', 'green', 'yellow', 'orange', 'red']

def pause():
_ = raw_input("Press Enter to exit:")

def spiral(t, step, step_incr, angle):
color_ind = 0
colors_len = len(colors)
t.pencolor(colors[color_ind])
while True:
t.forward(step)
step = step + step_incr
if step > 500:
break
t.right(angle)
color_ind = (color_ind + 1) % colors_len
t.pencolor(colors[color_ind])

t.hideturtle()
pause()

t.speed(0)
spiral(t, 20, 5, 90.2)


- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

My Python posts     Subscribe to my blog by email

My ActiveState recipes



August 26, 2016 03:38 AM

August 25, 2016


Continuum Analytics News

Celebrating U.S. Women's Equality Day with Women in Tech

Posted Thursday, August 25, 2016

August 26 is recognized as Women's Equality Day in the United States, celebrating the addition of the 19th Amendment to the Constitution in 1920, which granted women the right to vote. This amendment was the culmination of an immense movement in women's rights, dating all the way back to the first women's rights convention in Seneca Falls, New York, in 1848. 

To commemorate this day, we decided to reach out to influential, successful and all around superstar women in technology to ask them one question: 

If women were never granted the right to vote, how do you think the landscape of women in STEM would be different?

Katy Huff, @katyhuff 

"If women were never granted the right to vote, I think it's fair to say that other important movements on the front lines of women's rights would not have followed either. Without that basic recognition of equality -- the ability to participate in democracy -- would we have ever seen Title VII of the Civil Rights Act (1964) or Title IX of the Education Amendments (1972)? Surely not. And without them, women could legally be discriminated against when seeking an education and then again later when seeking employment. There wouldn't merely be a minority of women in tech (as is currently the case) - there would be a super-minority. If there were any women at all able to compete for these lucrative jobs, that tiny minority could legally be paid less than their colleagues and looked upon as second class citizens without any more voice in the workplace than in their own democracy."

Renee M. P. Teate, @BecomingDataSci

"If women were never granted the right to vote in the U.S., the landscape of women in STEM would be very different, because the landscape of our entire country would be different. Voting is a basic right in a democracy, and it is important to allow citizens of all races, sexes/genders, religions, wealth statuses, and backgrounds to participate in electing our leaders, and therefore shaping the laws of our country. When anyone is excluded from participating, they are not represented and can be more easily marginalized or treated unfairly under the law.

The 19th amendment gave women not only a vote and a voice, but "full legal status" as citizens. That definitely impacts our roles in the workplace and in STEM, because if the law doesn't treat you as a whole and valued participant, you can't expect peers or managers to, either. Additionally, if the law doesn't offer equal protection to everyone, discrimination would run (even more) rampant and there might be no legal recourse for incidents such as sexual harassment in the workplace.

A celebration of women is important within STEM fields, because it wasn't long ago that women were not seen as able to be qualified for many careers in STEM, including roles hired by public/governmental organizations like NASA that are funded by taxpayers and report to our elected officials. Even today, there are many prejudices against women, including beliefs by some that women are inferior at performing jobs such as computer programming and scientific research. There are also institutional biases in both our educational system and the workplace that we still need to work on. When women succeed despite these additional barriers (not to mention negative comments by unsupportive people and other detrators), that is worth celebrating.

Though there are still many issues relating to bias against women and people of color in STEM, without the basic right to vote we would be even further behind on the quest for equality in the U.S. than we are today."

Carol Willing, @WillingCarol

"From the 19th amendment ratification to now, several generations of women have made their contributions to technical fields. These women celebrated successes, failures, disappointments, hopes, and dreams.

Sometimes, as a person in tech, I wonder if my actions make a difference on others. Is it worth the subtle putdowns, assumptions about my ability, and, at times, overt bullying to continue working as an engineer and software developer? Truthfully, sometimes the answer is no, but most days my feeling is “YES! I have a right to study and work on technical problems that I find fascinating." My daughter, my son, and you have that right too.

Almost a decade ago, I watched the movie “Iron Jawed Angels” with my middle school daughter, her friend, and a friend of mine who taught middle school history. The movie was tough to watch. We were struck by the sacrifice made by suffragettes, Alice Paul and Lucy Burns, amid the brutal abuse from others that did not want women to vote. A powerful reminder that we can’t control the actions of others, but we can stand up for ourselves and our right to be engineers, developers, managers, and researcher in technical fields. Your presence in tech and your contributions make a difference to humanity now and tomorrow."

Jasmine Sandhu, @sandhujasmine

"Its a numbers game, if more people have an opportunity to contribute to a field, you have a lot more talent, many more ideas and that many more people working on solutions and new ideas.

The "Science" in STEM is key - an informed citizenry that asks for evidence when confronted with the many pseudoscientific claims that we navigate in everday life is critical. It is important for all of us to learn the scientific method and see its relevance in day to day life, so we 'ask for evidence' when people around us make claims about our diet, about our health, our civic discourse, our politics. Similarly, I wish I had learned statistics since childhood. It is an idea with which we should be very comfortable. Randomness is a part of our daily lives and being able to make decisions and take risks based less on how we feel about things and be able to analyze critically the options would be wonderful. Of course, education has a far greater impact in our lives than simply the demographic that we represent in a field. I'm still struck by the pseudoscience books aimed at little girls (astrology) and the scientific books targetting the boys (astronomy) - of course, this is an anecdotal example, but in the US we still hear about girls losing interest in science and math in middle school. Hard to believe this is the case in the 21st century.

Living in a place like Seattle in the 21st century has enabled opportunities for me that don't exist for a lot of women in the world. I work remotely in a technical field which gives me freedom to structure my day to care for my daughter, live close to my family which is my support structure, and earn well enough to provide for my daughter and I. STEM fields offer yet more opportunities for all people, including women."

We loved hearing the perspectives of these women in STEM. If you'd like to share your response, please respond in the comments below, or tweet us @ContinuumIO!

We've also created a special Anaconda graphic to celebrate, which you can see below. If you're currently at PyData Chicago, find the NumFOCUS table to grab a sticker! 

Happy Women's Equality Day! 

 

-Team Anaconda

 

 

 

August 25, 2016 08:42 PM


Import Python

ImportPython Issue 87


Worthy Read

video
Useful Youtube channel with short screencast/videos for Python developers to subscribe to. I learned on couple of sublime + Python tricks from here.

docker
This Dockerfile shows you how to build a Docker container with a fairly standard and speedy setup for Django with uWSGI and Nginx.

curated list
I have read some interesting Python tutorials lately. I would love to share them with you.


Try Hired and get in front of 4,000+ companies with one application. No more pushy recruiters, no more dead end applications and mismatched companies, Hired puts the power in your hands.
Sponsor

web framework
Ky?kai is a fast asynchronous Python server-side web framework. It is built upon asyncio and the Asphalt framework for an extremely fast web server.

python3
We recently upgraded our 160,000 lines of backend Python code from Python 2 to Python 3. We did with zero downtime and no major errors! Here’s how we did it, hopefully it will help anyone else still stuck on Python 2!

automation
Bangalore user group meet with Python Automation as the theme

Kickstarter Campaign for wxPython Cookbook.

podcast
What happens when you take a tech-driven online fashion company that is experiencing explosive growth and infuse it with a deep open-source mission? You'll find out on this episode of Talk Python To Me. We'll meet Lauri Apple and Rafael Caricio from Zalando where developers there have published almost 200 open source projects on Github.

django
There are many ways to handle permissions in a project. For instance we may have model level permissions, object level permissions, fine grained user permission or role based. Either way we don't need to be writing any of those from scratch, Django ecosystem has a vast amount of permission handling apps that will help us with the task. In this post we will compare how some popular permission apps work so you know which one suits your project needs.

image processing
Do you know what they are? If you are thinking of irrigation circles, you are wrong. Do not believe the lies of the conspirators. Those are, undoubtedly, proofs of extraterrestrial visitors on earth. As I want to be ready for the first contact I need to know where these guys are working. It should be easy with so many satellite images at hand. So I asked the machine learning experts around here to lend me a hand. Surprisingly, they refused. Mumbling I don’t know what about irrigation circles. Very suspicious. But something else they mentioned is that a better initial approach would be to use some computer-vision detection technique. Note - Code is here https://github.com/machinalis/satimg/blob/master/Searching%20for%20aliens.ipynb

community
Hopefully this post gave you some insight into why you should consider giving Python a go. This post is coming from someone who feels “guilty” for talking not so good about Python in the past and is now all over the hype train. In my defense, it was just a “personal preference thing”, when people asked me about which language they should learn first, for instance, I usually suggested Python.



Upcoming Conference / User Group Meet








Projects

fuzzer - 82 Stars, 7 Fork
A Python interface to AFL, allowing for easy injection of testcases and other functionality.

MEAnalyzer - 31 Stars, 6 Fork
Intel Engine Firmware Analysis Tool

pybble - 24 Stars, 1 Fork
Python on Pebble

tensorflow_demo - 6 Stars, 2 Fork
Tensorflow Demo for my TF in 5 Min Video on Youtube

washer - 5 Stars, 0 Fork
A whoosh-based CLI indexer and searcher for your files.

August 25, 2016 04:40 PM


Continuum Analytics News

Succeeding in the New World Order of Data

Posted Thursday, August 25, 2016
Travis Oliphant
Chief Executive Officer & Co-Founder

"If you want to understand function, study structure."

Sage advice from Francis Crick, who revolutionized genetics with his Nobel Prize winning co-discovery of the structure of DNA — launching more than six decades of fruitful research.

Crick was referring to biology, but today's companies competing in the Big Data space should heed his advice. With change at a pace this intense, understanding and optimizing one’s data science infrastructure — and therefore functionality — makes all the difference.

But, what’s the best way to do that?

Fortunately, there's an ideal solution for evolving in a rapidly-changing context while generating competitive insights from today's deluge of data.

That solution is an emerging movement called Open Data Science, which uses open source software to drive cutting-edge analytics that go far beyond what traditional proprietary data software can provide.

Shoring up Your Infrastructure

Open Data Science draws its power from four fundamental principles: accessibility, innovation, interoperability and transparency. These insure source code that’s accessible for the whole team — free from licensing restrictions or vendor release schedules — and works seamlessly with other tools.

Because open source libraries are free, the barrier to entry is very low, allowing teams to dive in and freely experiment without the concerns of a massive financial commitment up front, which encourages innovation.

Although transitioning to a new analytics infrastructure is never trivial, the community spirit of open source software and Open Data Science's commitment to interoperability makes it quite manageable.

Anaconda, for example, provides over 720 well-tested Python libraries for the demands of today's data science, all available from a single install. Business analysts can be brought on board with Anaconda Fusion, providing access to data analysis functions in Python within the familiar Excel interface.

With connectors to other languages, integration of legacy code, HPC and parallel computing, as well as visualizations easily deployed to the web, there’s no limit to what can be achieved with Open Data Science. 

Navigating Potential Pitfalls

With traditional solutions, unforeseen limits can bring the train to a screeching halt.

I know of a large government project that convened many experts to creatively solve problems using data. The agency had invested in a many node compute cluster with attached GPUs. But when the experts arrived, the software installed was not inclusive and allowed less than a third of them to actually use it.

Organizations cannot simply buy the latest monolithic tech from vendors and expect data science to just happen.  The software must enable data scientists and play to their strengths not only to the needs of IT operations.

Unlike proprietary offerings, Open Data Science has evolved along with the Big Data revolution —and, to a significant extent, driven it. Its toolset is designed with compatibilities that drive progress.

Setting up Your Scaffolding

Making the shift to an Open Data Science infrastructure is more than just choosing software and databases. It must also include people.

Companies should provision the time and resources necessary to set up new organizational structures and provide budgets to enable these groups to work effectively.  A pilot data-exploration team, a center of excellence or an emerging technology team are all examples of models that enable organizations to begin to uncover the opportunity in their data.  As the organization grows, individual roles may change or new ones may emerge.

Details of which toolsets to use will need to be hammered out. Many developers are already familiar with common Open Data Science applications, such as data notebooks like Jupyter, while others may require more of a learning curve to implement.

Choices such as programming languages will vary by developers' preferences and particular needs. Python is commonly used, and for good reason. It is, by far, the dominant language for scientific computing, and it integrates beautifully with Open Data Science.

Finally, well-managed migration is critical to success. Open Data Science allows for a number of options — from "co-existence" of Open Data Science with current infrastructure to piecemeal, or even full migration, all depending on a company's tolerance for risk or willingness to commit. Legacy code can also be retained and integrated with Open Data Science wrappers, allowing old but debugged and stable code-bases to serve new duty in a modern analytics environment.

Taking Data Science to a New Level

When genetics boomed as a science in the 1950s, new insights were always on the way. But, to get the ball rolling, biologists needed to understand DNA's structure — and exploit that understanding. Francis Crick and others began the process, and society continues to benefit.

Data Science is similarly poised on the cusp of an astounding future. Those organizations that understand their analytics infrastructure will excel in that new world, with Open Data Science as the instrument for success.

August 25, 2016 04:31 PM


Python Anywhere

Latest deploy: Some nice new features and a surprise

Rename web apps

Yes, we know it's been a long time coming, but now you can rename your web apps (and, as a result change the domain they're served from) right on the web app setup page. Look for the little edit pencil icon next to your web app address.

Students can share with teacher

We've made it easier for students to share their consoles with their teacher.

List invoices on accounts page

For those of you that may be wondering how much of your hard-earned money you've spent on PythonAnywhere, we've added a list of all of your invoices to the Account page.

PDF export for Jupyter notebooks works

A helpful user pointed out that "Download as PDF" wasn't working in Jupyter notebooks on PythonAnywhere. So we fixed it.

"bash console here" on editor page

If you're ever editing a file and want to open a Bash console in the same directory as the file, now you can.

General security, usability and stability fixes

As usual. This is usually where we put all the fixes for bugs that are too embarrassing to list.

Something great that we're not telling you anything about

until we've tested it ourselves.

August 25, 2016 02:09 PM


Python Software Foundation

PyCon APAC - Bringing us together

Two weekends ago I was lucky enough to get the chance to attend PyCon APAC 2016. This year the event was held in Seoul, South Korea at the COEX Convention Center within the Gangnam-gu district. PyCon APAC 2016 brought 1,500 Pythonistas together and it was organized by the PyCon Korea team. This was a very special trip for me as it was my first trip to Asia. The first day while we were figuring out the public transportation system, I did experience some brief challenges.



However, the following days at the conference settled my disorientation. Through this process, I realized that the same Python community qualities existed in South Korea as they do everywhere else in the world. We all may not have been able to communicate verbally, but the openness of the community still prevailed. The locals were welcoming, inclusive, and took the time to teach us Korean customs and culture. More than that, PyCon APAC 2016 stressed diversity of nationality and gender. One great way that the conference made everyone feel like they were part of the community was this sign that comprised all the names of the people who had pre-registered for the conference.



This meaningful sign had such a positive impact on the attendees as it acted as a constant reminder. I enjoyed watching attendees find their names in the sign, and all of the tweets that followed.

Through experiencing PyCon APAC, I also learned that the organizers spend a great deal of effort making their community strong and open. At the conference I was invited to attend the PyCon APAC organizers' meetings. During this meeting, the organizers addressed important questions such as "Do we continue PyCon APAC even though many APAC countries organize their own PyCon?" and "How do we continue to increase diversity?" It was decided during the meeting that the purpose of PyCon APAC goes beyond regional conferences and should continue. It helps build diversity and brings forth positive influences from other parts of the world. The organizers decided that each location should attempt to have a small portion of their budget set aside to send some of their community members to other “Indo-Asia-Pacific” regional conferences, especially the yearly APAC conference itself. Hearing how the team of organizers valued such questions and discussions showed me that they valued our community and that is one reason why their conferences are so successful.



Beyond community importance, the conference brought us together to discuss core Python development. Some of the questions I heard at the PSF booth were, "When will Python 2.7.x stop being supported?" and "What will happen to those of us that use 2.7.x in a corporate setting?" Their questions were based on PEP 373 and PEP 494, and their worries were relevant ones. Many think that Python 3.5.x still needs a lot more work before developers no longer need Python 2. Those questions are hard, and no one has an absolute answer, no matter how strong their beliefs. But our discussions led to how we all need to work on making Python 3.x better, since it is the future of the language. We discussed the need to port packages from Python 2 to 3, and the need for corporate support.



Regardless of the PyCon 2 vs Python 3 debate, the attendees were excited to get coding during the PyCon APAC Sprints. This was the first time the PyCon Korea team held sprints, and they did not know how many sprinters to expect. They were overwhelmed when that day came and they had to book additional space to accommodate everyone. As an organizer, I can tell you that this is a good problem to have, especially when the organizers react properly and swiftly.


During the Sprints/Tutorial day, Pythonistas attended a sprint about Pandas & PyData led by one of the creators of the pandas project, Wes McKinney. The picture above shows hands-on learning at the tutorial for DjangoCupcake. Others attended sessions about the Django Rest Framework, Write the Docs, Tox, Travis, and aiohttp led by Andrew Svetlov, a core Python developer.

Establishing connections with Pythonistas from the APAC region and beyond made the long flights to and from Seoul worth every minute. I hope to attend future PyCon APACs and reconnect with all the wonderful people I met during the conference. Thank you, organizers and attendees, for a memorable conference!


August 25, 2016 10:14 AM


PyCharm

Announcing PyCharm 2016.2.2

PyCharm 2016.2.2 is now available from the download page. Soon it will also be available as a patch update from within the IDE (from v2016.2.1).

With this update, we’ve fixed several major problems in the debugger and in the code analysis subsystem. The Release Notes lists all fixes for this update.

Download PyСharm 2016.2.2 for your platform from our website and please report any problem you found in the Issue Tracker.

If you’d like to discuss your experiences with PyCharm, we look forward to your feedback in comments to this post and on Twitter.

Your PyCharm Team
The Drive to Develop

August 25, 2016 09:45 AM


tryexceptpass

I’m glad you had a chance to go down this path.

Looks like you chose a more formal MVC approach and a django-esque structure, which seems to be what the world uses most these days, so it…

August 25, 2016 06:03 AM


Matthew Rocklin

Supporting Users in Open Source

What are the social expectations of open source developers to help users understand their projects? What are the social expectations of users when asking for help?

As part of developing Dask, an open source library with growing adoption, I directly interact with users over GitHub issues for bug reports, StackOverflow for usage questions, a mailing list and live Gitter chat for community conversation. Dask is blessed with awesome users. These are researchers doing very cool work of high impact and with novel use cases. They report bugs and usage questions with such skill that it’s clear that they are Veteran Users of open source projects.

Veteran Users are Heroes

It’s not easy being a veteran user. It takes a lot of time to distill a bug down to a reproducible example, or a question into an MCVE, or to read all of the documentation to make sure that a conceptual question definitely isn’t answered in the docs. And yet this effort really shines through and it’s incredibly valuable to making open source software better. These distilled reports are arguably more important than fixing the actual bug or writing the actual documentation.

Bugs occur in the wild, in code that is half related to the developer’s library (like Pandas or Dask) and half related to the user’s application. The veteran user works hard to pull away all of their code and data, creating a gem of an example that is trivial to understand and run anywhere that still shows off the problem.

This way the veteran user can show up with their problem to the development team and say “here is something that you will quickly understand to be a problem.” On the developer side this is incredibly valuable. They learn of a relevant bug and immediately understand what’s going on, without having to download someone else’s data or understand their domain. This switches from merely convenient to strictly necessary when the developers deal with 10+ such reports a day.

Novice Users need help too

However there are a lot of novice users out there. We have all been novice users once, and even if we are veterans today we are probably still novices at something else. Knowing what to do and how to ask for help is hard. Having the guts to walk into a chat room where people will quickly see that you’re a novice is even harder. It’s like using public transit in a deeply foreign language. Respect is warranted here.

I categorize novice users into two groups:

  1. Experienced technical novices, who are very experienced in their field and technical things generally, but who don’t yet have a thorough understanding of open source culture and how to ask questions smoothly. They’re entirely capable of behaving like a veteran user if pointed in the right directions.
  2. Novice technical novices, who don’t yet have the ability to distill their problems into the digestible nuggets that open source developers expect.

In the first case of technically experienced novices, I’ve found that being direct works surprisingly well. I used to be apologetic in asking people to submit MCVEs. Today I’m more blunt but surprisingly I find that this group doesn’t seem to mind. I suspect that this group is accustomed to operating in situations where other people’s time is very costly.

The second case of novice novice users are more challenging for individual developers to handle one-by-one, both because novices are more common, and because solving their problems often requires more time commitment. Instead open source communities often depend on broadcast and crowd-sourced solutions, like documentation, StackOverflow, or meetups and user groups. For example in Dask we strongly point people towards StackOverflow in order to build up a knowledge-base of question-answer pairs. Pandas has done this well; almost every Pandas question you Google leads to a StackOverflow post, handling 90% of the traffic and improving the lives of thousands. Many projects simply don’t have the human capital to hand-hold individuals through using the library.

In a few projects there are enough generous and experienced users that they’re able to field questions from individual users. SymPy is a good example here. I learned open source programming within SymPy. Their community was broad enough that they were able to hold my hand as I learned Git, testing, communication practices and all of the other soft skills that we need to be effective in writing great software. The support structure of SymPy is something that I’ve never experienced anywhere else.

My Apologies

I’ve found myself becoming increasingly impolite when people ask me for certain kinds of extended help with their code. I’ve been trying to track down why this is and I think that it comes from a mismatch of social contracts.

Large parts of technical society have an (entirely reasonable) belief that open source developers are available to answer questions about how we use their project. This was probably true in popular culture, where our stereotypical image of an open source developer was working out of their basement long into the night on things that relatively few enthusiasts bothered with. They were happy to engage and had the free time in which to do it.

In some ways things have changed a lot. We now have paid professionals building software that is used by thousands or millions of users. These professionals easily charge consulting fees of hundreds of dollars per hour for exactly the kind of assistance that people show up expecting for free under the previous model. These developers have to answer for how they spend their time when they’re at work, and when they’re not at work they now have families and kids that deserve just as much attention as their open source users.

Both of these cultures, the creative do-it-yourself basement culture and the more corporate culture, are important to the wonderful surge we’ve seen in open source software. How do we balance them? Should developers, like doctors or lawyers perform pro-bono work as part of their profession? Should grants specifically include paid time for community engagement and outreach? Should users, as part of receiving help feel an obligation to improve documentation or stick around and help others?

Solutions?

I’m not sure what to do here. I feel an obligation to remain connected with users from a broad set of applications, even those that companies or grants haven’t decided to fund. However at the same time I don’t know how to say “I’m sorry, I simply don’t have the time to help you with your problem.” in a way that feels at all compassionate.

I think that people should still ask questions. I think that we need to foster an environment in which developers can say “Sorry. Busy.” more easily. I think that we as a community need better resources to teach novice users to become veteran users.

One positive approach is to honor veteran users, and through this public praise to encourage other users to “up their game”, much as developers do today with coding skills. There are thousands of blogposts about how to develop code well, and people strive tirelessly to improve themselves. My hope is that by attaching the language of skill, like the term “veteran”, to user behaviors we can create an environment where people are proud of how cleanly they can raise issues and how clearly they can describe questions for documentation. Doing this well is critical for a project’s success and requires substantial effort and personal investment.

August 25, 2016 12:00 AM

August 24, 2016


Nikola

Automating Nikola rebuilds with Travis CI

In this guide, we’ll set up Travis CI to rebuild a Nikola website and host it on GitHub Pages.

Why?

By using Travis CI to build your site, you can easily blog from anywhere you can edit text files. Which means you can blog with only a web browser and GitHub.com or try a service like Prose.io. You also won’t need to install Nikola and Python to write. Or a real computer, a mobile phone could probably access one of those services and write something.

Caveats

  • The build might take a couple minutes to finish (1:30 for the demo site; YMMV)
  • When you commit and push to GitHub, the site will be published unconditionally. If you don’t have a copy of Nikola for local use, there is no way to preview your site.

What you need

  • A computer for the initial setup that can run Nikola and the Travis CI command-line tool (written in Ruby) — you need a Unix-like system (Linux, OS X, *BSD, etc.); Windows users should try Bash on Ubuntu on Windows (available in Windows 10 starting with Anniversary Update) or a Linux virtual machine.
  • A GitHub account (free)
  • A Travis CI account linked to your GitHub account (free)

Setting up Nikola

Start by creating a new Nikola site and customizing it to your liking. Follow the Getting Started guide. You might also want to add support for other input formats, namely Markdown, but this is not a requirement (unless you want to use Prose.io).

After you’re done, you must configure deploying to GitHub in Nikola. Make your first deployment from your local computer and make sure your site works right. Don’t forget to set up .gitignore. Moreover, you must set GITHUB_COMMIT_SOURCE = False — otherwise, Travis CI will go into an infinite loop.

If everything works, you can make some change to your site (so you see that rebuilding works), but don’t commit it just yet.

Setting up Travis CI

Next, we need to set up Travis CI. To do that, make sure you have the ruby and gem tools installed on your system. If you don’t have them, install them from your OS package manager.

First, download/copy the .travis.yml file (note the dot in the beginning; the downloaded file doesn’t have it!) and adjust the real name, e-mail (used for commits; line 12/13), and the username/repo name on line 21. If you want to render your site in another language besides English, add the appropriate Ubuntu language pack to the list in this file.

travis.yml

# Travis CI config for automated Nikola blog deployments
language: python
cache: apt
sudo: false
addons:
  apt:
    packages:
    - language-pack-en-base
python:
- 3.5
before_install:
- git config --global user.name 'Travis CI'
- git config --global user.email 'travis@invalid'
- git config --global push.default 'simple'
- pip install --upgrade pip wheel
- echo -e 'Host github.com\n    StrictHostKeyChecking no' >> ~/.ssh/config
- eval "$(ssh-agent -s)"
- chmod 600 id_rsa
- ssh-add id_rsa
- git remote rm origin
- git remote add origin git@github.com:USERNAME/REPO.git
- git fetch origin master
- git branch master FETCH_HEAD
install:
- pip install 'Nikola[extras]'
script:
- nikola build && nikola github_deploy -m 'Nikola auto deploy [ci skip]'
notifications:
    email:
        on_success: change
        on_failure: always

Next, we need to generate a SSH key for Travis CI.

echo id_rsa >> .gitignore
echo id_rsa.pub >> .gitignore
ssh-keygen -C TravisCI -f id_rsa -N ''

Open the id_rsa.pub file and copy its contents. Go to GitHub → your page repository → Settings → Deploy keys and add it there. Make sure Allow write access is checked.

And now, time for our venture into the Ruby world. Install the travis gem:

gem install --user-install travis

You can then use the travis command if you have configured your $PATH for RubyGems; if you haven’t, the tool will output a path to use (eg. ~/.gem/ruby/2.0.0/bin/travis)

We’ll use the Travis CI command-line client to log in (using your GitHub password), enable the repository and encrypt our SSH key. Run the following three commands, one at a time (they are interactive):

travis login
travis enable
travis encrypt-file id_rsa --add

Commit everything to GitHub:

git add .
git commit -am "Automate builds with Travis CI"

Hopefully, Travis CI will build your site and deploy. Check the Travis CI website or your e-mail for a notification. If there are any errors, make sure you followed this guide to the letter.

August 24, 2016 06:05 PM


Python Piedmont Triad User Group

PYPTUG Monthly meeting August 30 2016 (flask-restplus, openstreetmap)

Come join PYPTUG at out next monthly meeting (August 30th 2016) to learn more about the Python programming language, modules and tools. Python is the perfect language to learn if you've never programmed before, and at the other end, it is also the perfect tool that no expert would do without. Monthly meetings are in addition to our project nights.




What

Meeting will start at 6:00pm.

We will open on an Intro to PYPTUG and on how to get started with Python, PYPTUG activities and members projects, in particular some updates on the Quadcopter project, then on to News from the community.

Then on to the main talk.

 
Main Talk: Building a RESTful API with Flask-Restplus and Swagger
by Manikandan Ramakrishnan
Bio:
Manikandan Ramakrishnan is a Data Engineer with Inmar Inc.

Abstract:
Building an API and documenting it properly is like having a cake and eating it too. Flask-RESTPlus is a great Flask extension that makes it really easy to build robust REST APIs quickly with minimal setup. With its built in Swagger integration, it is extremely simple to document the endpoints and to enforce request/response models.

Lightning talks! 


We will have some time for extemporaneous "lightning talks" of 5-10 minute duration. If you'd like to do one, some suggestions of talks were provided here, if you are looking for inspiration. Or talk about a project you are working on.
One lightning talk will cover OpenStreetMap

When

Tuesday, August 30th 2016
Meeting starts at 6:00PM

Where

Wake Forest University, close to Polo Rd and University Parkway:

Wake Forest University, Winston-Salem, NC 27109

 Map this

See also this campus map (PDF) and also the Parking Map (PDF) (Manchester hall is #20A on the parking map)

And speaking of parking:  Parking after 5pm is on a first-come, first-serve basis.  The official parking policy is:
"Visitors can park in any general parking lot on campus. Visitors should avoid reserved spaces, faculty/staff lots, fire lanes or other restricted area on campus. Frequent visitors should contact Parking and Transportation to register for a parking permit."

Mailing List


Don't forget to sign up to our user group mailing list:

https://groups.google.com/d/forum/pyptug?hl=en

It is the only step required to become a PYPTUG member.

RSVP on meetup:
https://www.meetup.com/PYthon-Piedmont-Triad-User-Group-PYPTUG/events/233095834/

August 24, 2016 03:28 PM


Machinalis

Searching for aliens

First contact

Have you ever seen through a plane’s window, or in Google Maps, some precisely defined circles on the Earth? Typically many of them, close to each other? Something like this:

Do you know what they are? If you are thinking of irrigation circles, you are wrong. Do not believe the lies of the conspirators. Those are, undoubtedly, proofs of extraterrestrial visitors on earth.

As I want to be ready for the first contact I need to know where these guys are working. It should be easy with so many satellite images at hand.

So I asked the machine learning experts around here to lend me a hand. Surprisingly, they refused. Mumbling I don’t know what about irrigation circles. Very suspicious. But something else they mentioned is that a better initial approach would be to use some computer-vision detection technique.

So, there you go. Those damn conspirators gave me the key.

Aliens

Circles detection

So now, in the Python ecosystem computer vision means OpenCV. And as it happens, this library has got the HoughCircles module which finds circles in an image. Not surprising: OpenCV has a bazillion of useful modules like that.

Lets make it happen.

First of all, I’m going to use Landsat 8 data. I’ll choose scene 229/82 for two reasons:

  • I know it includes circles, and
  • it includes my house (I want to meet the extraterrestrials living close by, not those in Area 51)
Crop of the Landsat 8 scene 229/82

Crop of the Landsat scene 229/82

The first issue I have to solve is that the HoughCircles function

finds circles in a grayscale image using a modification of the Hough transform

Well, grayscale does not exactly match multi-band Landsat 8 data, but each one of the bands can be treated as a single grayscale image. Now, a circle can express itself differently in different bands, because each band has its own way to sense the earth. So, the detector can define slightly different center coordinates for the same circle. For that reason, if two centers are too close then I’m going to keep only one of them (and discard the other as repeated).

Next, I need to determine the maximum and minimum circle’s radius. Typically, those circles sizes vary, from 400 mts up to 800 mts. That is between 13 and 26 Landsat pixels (30 mts). That’s a starting point. For the rest of the parameters I’ll just play around and try different values (not very scientific, I’m sorry).

So I run my script (which you can see in this Jupyter notebook) and without too much effort I can see that the circles are detected:

Crop of the Landsat 8 scene 229/82, with detected circles

Crop of the Landsat 8 scene 229/82, with the detected circles (the colors of the circles correspond to the size).

By changing the parameters I get to detect more (getting more false-positives) or less circles (missing some real ones). As usual, there’s a trade-off there.

Filter-out false positives

These circles only make sense in farming areas. If I configure the program not to miss real circles, then I get a lot of false positives. There are too many detected circles in cities, clouds, mountains, around rivers, etc.

Crop of the Landsat 8 scene 229/82, with detected circles and labels

That’s a whole new problem that I will need to solve. I can use vegetation indices, texture computation, machine learning. There’s a whole battery of possibilities to explore. Intuition, experience, domain knowledge, good data-science practices and ufology will help me out here. Unlucky enough, all that is out of the scope of this post.

So, my search for aliens will continue.

Mental disorder disclaimer

I hope it’s clear that all the search for aliens story is fictional. Just an amusing way to present the subject.

Once clarified that, the technical aspects in the post are still valid.

To help our friends of Kilimo, we developed an irrigation circles’ detector prototype. As hinted before, instead of approaching the problem with machine learning we attacked it using computer vision techniques.

Please, feel free to comment, contact us or whatever. I’m @py_litox on Twitter.

August 24, 2016 02:19 PM


Hynek Schlawack

Hardening Your Web Server’s SSL Ciphers

There are many wordy articles on configuring your web server’s TLS ciphers. This is not one of them. Instead I will share a configuration which is both compatible enough for today’s needs and scores a straight “A” on Qualys’s SSL Server Test.

August 24, 2016 01:49 PM


pythonwise

Generate Relation Diagram from GAE ndb Model

Working with GAE, we wanted to create relation diagram from out ndb model. By deferring the rendering to dot and using Python's reflection this became an easy task. Some links are still missing since we're using ancestor queries, but this can be handled by some class docstring syntax or just manually editing the resulting dot file.

August 24, 2016 06:51 AM


Codementor

Asynchronous Tasks using Celery with Django

celery with django

Prerequisites

Introduction

Celery is a task queue based on distributed message passing. It is used to handle long running asynchronous tasks. RabbitMQ, on the other hand, is message broker which is used by Celery to send and receive messages. Celery is perfectly suited for tasks which will take some time to execute but we don’t want our requests to be blocked while these tasks are processed. Case in point are sending emails, SMSs, making remote API calls, etc.

Target

Using the Local Django Application

Local File Directory Structure

We are going to use a Django application called mycelery. Our directory is structured in this way. The root of our django application is ‘mycelery’.

Add the lines in the settings.py file to tell the Celery that we will use RabbitMQ as out message broker and accept data in json format.

BROKER_URL = 'amqp://guest:guest@localhost//'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'

CELERY_ACCEPT_CONTENT is the type of contents allowed to receive.
CELERY_TASK_SERIALIZER is a string used for identifying default serialization method.
CELERY_RESULT_SERIALIZER is the type of result serialization format.

After adding the message broker, add the lines in a new file celery.py that tells Celery that we will use the settings in settings.py defined above.

from __future__ import absolute_import
import os
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'mycelery.settings')

from django.conf import settings
from celery import Celery

app = Celery('mycelery',
             backend='amqp',
             broker='amqp://guest@localhost//')

# This reads, e.g., CELERY_ACCEPT_CONTENT = ['json'] from settings.py:
app.config_from_object('django.conf:settings')

# For autodiscover_tasks to work, you must define your tasks in a file called 'tasks.py'.
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

@app.task(bind=True)
def debug_task(self):
    print("Request: {0!r}".format(self.request))

Create tasks in Celery

Create an app named myceleryapp and make a tasks.py file in this app’s folder. All the tasks will be defined in this file.

In the tasks.py, we are just playing with a number so that it will be a long task for now.

from celery import shared_task,current_task
from numpy import random
from scipy.fftpack import fft

@shared_task
def fft_random(n):
	for i in range(n):
    	x = random.normal(0, 0.1, 2000)
    	y = fft(x)
    	if(i%30 == 0):
        	process_percent = int(100 * float(i) / float(n))
        	current_task.update_state(state='PROGRESS',
                                  	meta={'process_percent': process_percent})
	return random.random()

Using current_task.update_state() method, we can pass the status of the task completed to the message broker every 30 iterations.

Calling tasks in Django

To call the above task, the following lines of code is required. You can put these lines in your files from wherever you want to call them.

from .tasks import fft_random
job = fft_random.delay(int(n))

Import the method and make a call. That’s it! Now your operation is running in background.

Get the status of the task

To get the status of the task above, define the following method in views.py

# Create your views here.
def task_state(request):
    data = 'Fail'
    if request.is_ajax():
        if 'task_id' in request.POST.keys() and request.POST['task_id']:
            task_id = request.POST['task_id']
            task = AsyncResult(task_id)
            data = task.result or task.state
        else:
            data = 'No task_id in the request'
    else:
        data = 'This is not an ajax request'

    json_data = json.dumps(data)
    return HttpResponse(json_data, content_type='application/json')

Task_id is send to the method from the JavaScript. This method checks the status of the task with id task_id and return them in json format to JavaScript. We can then call this method from our JavaScript and show a corresponding bar.

Conclusion

We can use Celery to run different types of tasks from sending emails to scraping a website. In the case of long running tasks, we’d like to show the status of the task to our user, and we can use a simple JavaScript bar which calls the task status url and sets the time spent on the task. With the help of Celery, a user’s experience on Django websites can be improved dramatically.

Other tutorials you might also be interested in:

August 24, 2016 06:36 AM


Reuven Lerner

Fun with floats

I’m in Shanghai, and before I left to teach this morning, I decided to check the weather.  I knew that it would be hot, but I wanted to double-check that it wasn’t going to rain — a rarity during Israeli summers, but not too unusual in Shanghai.

I entered “shanghai weather” into DuckDuckGo, and got the following:

Never mind that it gave me a weather report for the wrong Chinese city. Take a look at the humidity reading!  What’s going on there?  Am I supposed to worry that it’s ever-so-slightly more humid than 55%?

The answer, of course, is that many programming languages have problems with floating-point numbers.  Just as there’s no terminating decimal number to represent 1/3, lots of numbers are non-terminating when you use binary, which computers do.

As a result floats are inaccurate.  Just add 0.1 + 0.2 in many programming languages, and prepare to be astonished.  Wait, you don’t want to fire up a lot of languages? Here, someone has done it for you: http://0.30000000000000004.com/ (I really love this site.)

If you’re working with numbers that are particularly sensitive, then you shouldn’t be using floats. Rather, you should use integers, or use something like Python’s decimal.Decimal, which guarantees accuracy at the expense of time and space. For example:

>> from decimal import Decimal
>>> x = Decimal('0.1')
>>> y = Decimal('0.2')
>>> x + y
Decimal('0.3')
>>> float(x+y)
0.3

Of course, you should be careful not to create your decimals with floats:

>> x = Decimal(0.1)
>>> y = Decimal(0.2)
>>> x + y
Decimal('0.3000000000000000166533453694')

Why is this the case? Let’s take a look:

>> x
Decimal('0.1000000000000000055511151231257827021181583404541015625')

>>> y
Decimal('0.200000000000000011102230246251565404236316680908203125')

So, if you’re dealing with sensitive numbers, be sure not to use floats! And if you’re going outside in Shanghai today, it might be ever-so-slightly less humid than your weather forecast reports.

The post Fun with floats appeared first on Lerner Consulting Blog.

August 24, 2016 03:16 AM