skip to navigation
skip to content

Planet Python

Last update: September 20, 2017 07:47 PM

September 20, 2017


DataCamp

New Course: Case Studies in Statistical Thinking!

Hello everyone! We just launched Case Studies in Statistical Thinking by Justin Bois, our latest Python course!

Mastery requires practice. Having completed Statistical Thinking I and II, you developed your probabilistic mindset and the hacker stats skills to extract actionable insights from your data. Your foundation is in place, and now it is time practice your craft. In this course, you will apply your statistical thinking skills, exploratory data analysis, parameter estimation, and hypothesis testing, to two new real-world data sets. First, you will explore data from the 2013 and 2015 FINA World Aquatics Championships, where you will quantify the relative speeds and variability among swimmers. You will then perform a statistical analysis to assess the "current controversy" of the 2013 Worlds in which swimmers claimed that a slight current in the pool was affecting result. Second, you will study the frequency and magnitudes of earthquakes around the world. Finally, you will analyze the changes in seismicity in the US state of Oklahoma after the practice of high pressure waste water injection at oil extraction sites became commonplace in the last decade. As you work with these data sets, you will take vital steps toward mastery as you cement your existing knowledge and broaden your abilities to use statistics and Python to make sense of your data.

Take me to chapter 1!

Case Studies in Statistical Thinking features interactive exercises that combine high-quality video, in-browser coding, and gamification for an engaging learning experience that will make you an expert in applied statistical thinking!

What you'll learn

1. Fish sleep and bacteria growth: A review of Statistical Thinking I and II

To begin, you'll use two data sets from Caltech researchers to rehash the key points of Statistical Thinking I and II to prepare you for the following case studies!

2. Analysis of results of the 2015 FINA World Swimming Championships

In this chapter, you will practice your EDA, parameter estimation, and hypothesis testing skills on the results of the 2015 FINA World Swimming Championships.

3. The "Current Controversy" of the 2013 World Championships

Some swimmers said that they felt it was easier to swim in one direction versus another in the 2013 World Championships. Some analysts have posited that there was a swirling current in the pool. In this chapter, you'll investigate this claim! References - Quartz MediaWashington PostSwimSwam (and also here), and Cornett, et al.

4. Statistical seismology and the Parkfield region

Herein, you'll use your statistical thinking skills to study the frequency and magnitudes of earthquakes. Along the way, you'll learn some basic statistical seismology, including the Gutenberg-Richter law. This exercise exposes two key ideas about data science: 1) As a data scientist, you wander into all sorts of domain-specific analyses, which is very exciting. You constantly get to learn. 2) You are sometimes faced with limited data, which is also the case for many of these earthquake studies. You can still make good progress!

5. Earthquakes and oil mining in Oklahoma

Of course, earthquakes have a big impact on society and recently are connected to human activity. In this final chapter, you'll investigate the effect that increased injection of saline wastewater due to oil mining in Oklahoma has had on the seismicity of the region.

Hone your real-world data science skills in our course Case Studies in Statistical Thinking!

September 20, 2017 02:30 PM


Andre Roberge

Reeborg's World partially available in Polish

Thanks to Adam Jurkiewicz, Reeborg's World is now partially available in Polish.

September 20, 2017 09:17 AM


Python Bytes

#44 pip install malicious-code

<p>This episode is brought to you by Datadog: <a href="https://pythonbytes.fm/datadog"><strong>pythonbytes.fm/datadog</strong></a></p> <p><strong>Michael #1:</strong> <a href="https://arstechnica.com/information-technology/2017/09/devs-unknowingly-use-malicious-modules-put-into-official-python-repository/"><strong>Ten Malicious Libraries Found on PyPI</strong></a></p> <ul> <li>Code packages available in PyPI contained modified installation scripts.</li> <li>Vulnerabilities were introduced into the setup.py execution of packages for approximately 20 packages on PyPI</li> <li>Package names that closely resembled those used for packages found in the standard Python library (e.g. <code>urlib</code> vs <code>urllib</code>)</li> <li>The packages contained the exact same code as the upstream libraries except for an installation script.</li> <li>Officials with the Slovak authority said they recently notified PyPI administrators of the activity, and all identified packages were taken down immediately. Removal of the infected libraries, however, does nothing to purge them from servers that installed them.</li> <li>From PSF: <em>Unlike some language package management systems, PyPI does not have any full time staff devoted to it. It is a volunteer run project with only two active administrators. As such, it doesn't currently have resources for some of the proposed solutions such as actively monitoring or approving every new project published to PyPI. Historically and by necessity we've relied on a reactive strategy of taking down potentially malicious projects as we've become aware of them.</em></li> <li>Comments <ul> <li><a href="https://arstechnica.com/information-technology/2017/09/devs-unknowingly-use-malicious-modules-put-into-official-python-repository/?comments=1&amp;post=33997861">pip gets more paranoid in the install process</a></li> <li><a href="https://arstechnica.com/information-technology/2017/09/devs-unknowingly-use-malicious-modules-put-into-official-python-repository/?comments=1&amp;post=34000031">Downloads were not super bad</a></li> <li><a href="https://arstechnica.com/information-technology/2017/09/devs-unknowingly-use-malicious-modules-put-into-official-python-repository/?comments=1&amp;post=33999957">Stestagg is sitting on lots of misspellings</a> -<a href="https://arstechnica.com/information-technology/2017/09/devs-unknowingly-use-malicious-modules-put-into-official-python-repository/?comments=1&amp;post=33999819">Undergrad thesis compromised Ruby and NodeJS too</a></li> </ul></li> <li>related: <ul> <li>original warning: <a href="http://www.nbu.gov.sk/skcsirt-sa-20170909-pypi/">http://www.nbu.gov.sk/skcsirt-sa-20170909-pypi/</a></li> <li>stdlib names no longer allowed: <a href="https://github.com/pypa/warehouse/pull/2409">https://github.com/pypa/warehouse/pull/2409</a></li> </ul></li> </ul> <p><strong>Brian #2: PyPI migration to Warehouse is in progress</strong></p> <ul> <li>Thanks to Jonas Neubert for researching this topic and writing a blog post titled <a href="https://jonemo.github.io/neubertify/2017/09/13/publishing-your-first-pypi-package/">Publishing your First PyPI Package by/for the Absolute Beginner</a></li> <li>The steps to publish to PyPI have changed with the move to warehouse and pypi.org.</li> <li><a href="http://pypi.org/">pypi.org</a> is no longer in read-only mode, it is where you publish packages</li> <li>The old APIs at <a href="http://pypi.python.org/pypi">pypi.python.org/pypi</a> are disabled, if you have a .pypirc file you'll have to update the URLs</li> <li>You no longer need to register package names before first uploading, the project gets created on the fly during the first upload of the package.</li> <li>The best way to update anything in a package is to change your local package and upload it again, see <a href="https://github.com/pypa/warehouse/issues/2170">https://github.com/pypa/warehouse/issues/2170</a>. <ul> <li>This includes even just changes to the description.</li> <li>Manual file upload is gone.</li> </ul></li> <li>As of right now it looks like you still need to register through pypi.python.org, then do the rest of the interactions with pypi.org. See <a href="https://github.com/pypa/warehouse/issues/2065">https://github.com/pypa/warehouse/issues/2065</a></li> <li>Markdown support for package descriptions, like README.md seems to be coming: <a href="https://packaging.python.org/specifications/#description-content-type">https://packaging.python.org/specifications/#description-content-type</a></li> <li>Jonas’ <a href="https://jonemo.github.io/neubertify/2017/09/13/publishing-your-first-pypi-package/">blog post</a> is from 13 Sep 2017, so it might be the most up to date tutorial on all the steps to get a package onto PyPI.</li> </ul> <p><strong>Brian #3:</strong> <strong>Live coding in a presentation</strong></p> <ul> <li>Last week’s discussion of <a href="https://www.youtube.com/watch?v=js_0wjzuMfc">David Beazley’s Fun of Reinvention talk</a> got me thinking about doing live coding during a presentation since he did it so well.</li> <li>Several links regarding how to do various levels of live coding: <ul> <li>Advice for live coding: <a href="https://code.tutsplus.com/articles/the-holy-grail-of-conference-talks-live-coding--net-30217">https://code.tutsplus.com/articles/the-holy-grail-of-conference-talks-live-coding--net-30217</a></li> <li>Not quite live coding: <a href="https://vanslaars.io/post/not-quite-live-coding/">https://vanslaars.io/post/not-quite-live-coding/</a></li> <li>Avoiding live coding: <a href="https://codeplanet.io/techniques-avoid-live-coding-part/">https://codeplanet.io/techniques-avoid-live-coding-part/</a></li> </ul></li> <li>Live coding: <ul> <li>practice, have a backup plan, don’t forget to talk, plan content</li> </ul></li> <li>not quite: <ul> <li>use git tags</li> </ul></li> <li>avoiding it: <ul> <li>My favorite effect is fade-in slideshows where part of the code is shown at a time so you can talk about it and people know which bit to look at</li> </ul></li> </ul> <p><strong>Michael #4: Notable REST / Web Frameworks</strong></p> <ul> <li><p><strong>Falcon: <a href="https://falconframework.org/">https://falconframework.org/</a></strong></p> <ul> <li>Unburdening APIs for over 4.70 x 10-2 centuries. (4.7 years)</li> <li>Falcon is a bare-metal Python web API framework for building very fast app backends and microservices.</li> <li><strong>Complementary:</strong> Falcon complements more general Python web frameworks by providing bare-metal performance and flexibility wherever you need it.</li> <li><strong>Compatible</strong>: Thanks to WSGI, Falcon runs on a large variety of web servers and platforms. Falcon works great with CPython 2.6, 2.7, and 3.3+. Try PyPy for an extra speed boost.</li> </ul></li> <li><p><strong>Hug: <a href="http://hug.rest">http://hug.rest</a></strong></p> <ul> <li>Drastically simplify API development over multiple interfaces. </li> <li>With hug, design and develop your API once, then expose it however your clients need to consume it. Be it locally, over HTTP, or through the command line.</li> <li>Built-in documentation</li> </ul></li> </ul> <p><strong>Brian #5:</strong> <strong>Tox</strong></p> <ul> <li>“The name of the <a href="https://pypi.python.org/pypi/tox">tox automation project</a> derives from "testing out of the box". It aims to "automate and standardize testing in Python". Conceptually it is one level above pytest and serves as a command line frontend for running tests and automate all kinds of tasks around the project. It also acts as a frontend for <a href="https://en.wikipedia.org/wiki/Continuous_integration">Continuous Integration Systems</a> to unify what you do locally and what happens in e.g. Jenkins or Travis CI.” - Oliver Bestweller</li> <li>a small tox.ini file: [tox] envlist = py27,py35, py36 [testenv] deps=pytest commands=pytest</li> <li>You place this in your package source directory and then run tox, which will: <ul> <li>Use setup.py to create a sdist</li> <li>create a virtual environment for each environment in envlist</li> <li>Install dependencies in the environments</li> <li>Install your package into the environment</li> <li>Run the tests</li> <li>Do this for multiple environments, so multiple Python versions (as an example)</li> </ul></li> <li>Much more powerful than that, but that’s how many people use it.</li> <li>Further Reading: <ul> <li><a href="http://tox.readthedocs.io/en/latest/index.html">http://tox.readthedocs.io/en/latest/index.html</a></li> <li><a href="http://tox.readthedocs.io/en/latest/example/basic.html">http://tox.readthedocs.io/en/latest/example/basic.html</a> </li> <li><a href="https://blog.ionelmc.ro/2015/04/14/tox-tricks-and-patterns/">https://blog.ionelmc.ro/2015/04/14/tox-tricks-and-patterns/</a></li> </ul></li> </ul> <p><strong>Michael #6: flake8-tidy-imports</strong> <a href="https://pypi.python.org/pypi/flake8-tidy-imports#options"><strong>deprecated imports</strong></a></p> <ul> <li>You can declare {python2to3} as a banned-module import, and it will check against a long list of import moves and removals between python 2 and python 3, suggesting relevant replacements if available. </li> <li>I meticulously compiled this list by reading release notes from Python 3.0-3.6 as well as testing in a large legacy python codebase, but I presumably missed a few.</li> <li>Example:</li> </ul> <pre><code> flake8 file.py file.py:1:1: I201 Banned import 'mock' used - use unittest.mock instead. </code></pre> <p><strong>Michael #7 (bonus!):</strong> <a href="https://emptysqua.re/blog/coaching-for-first-time-pygotham-speakers/"><strong>Help Me Offer Coaching to First-Time PyGotham Speakers</strong></a></p> <ul> <li>Via A. Jesse Jiru Davis</li> <li>I want to raise $1200 for public-speaking coaching for first-time speakers at PyGotham, the New York City Python conference. Will you chip in?</li> <li>Jesse is a PyGotham conference organizer, but I’m launching this fundraiser independently of PyGotham.</li> <li>As of September 19, I have raised my goal. Thanks to everyone who donated! <h2>Our news</h2></li> </ul> <p><strong>Michael</strong>: </p> <ul> <li>Finished writing my <strong>free MongoDB course</strong> (subscribe to get notified of release at <strong><a href="https://training.talkpython.fm/getnotified">https://training.talkpython.fm/getnotified</a></strong> )</li> <li><a href="https://github.com/mikeckennedy/python-switch"><strong>python-switch</strong></a> kind of went off the hook (see <a href="https://github.com/mikeckennedy/python-switch">this</a> and <a href="https://www.reddit.com/r/Python/comments/70413x/adding_a_switch_statement_to_python/">that</a>)</li> </ul> <p><strong>Brian</strong>: </p> <ul> <li>Book is shipping: <a href="https://pragprog.com/book/bopytest/python-testing-with-pytest">Python Testing with pytest</a></li> </ul>

September 20, 2017 08:00 AM


Python Insider

Python 2.7.14 released

The latest bugfix release in the Python 2.7 series, Python 2.7.14, is now available for download.

September 20, 2017 12:23 AM

September 19, 2017


Python Insider

Python 3.6.3rc1 and 3.7.0a1 now available for testing and more

The Python build factories have been busy the last several weeks preparing
our fall lineup of releases.  Today we are happy to announce three
additions: 3.6.3rc1, 3.7.0a1, and 3.3.7 final, which join last weekend's
2.7.14 and last month's 3.5.4 bug-fix releases and 3.4.7 security-fix
update (see all downloads).

1. Python 3.6.3rc1 is the first release candidate for Python 3.6.3, the next
maintenance release of Python 3.6.  While 3.6.3rc1 is a preview release and,
thus, not intended for production environments, we encourage you to explore
it and provide feedback via the Python bug tracker (https://bugs.python.org).
3.6.3 is planned for final release on 2017-10-02 with the next maintenance
release expected to follow in about 3 months.  You can find Python 3.6.3rc1
and more information here:
    https://www.python.org/downloads/release/python-363rc1/

2. Python 3.7.0a1 is the first of four planned alpha releases of Python 3.7,
the next feature release of Python.  During the alpha phase, Python 3.7
remains under heavy development: additional features will be added
and existing features may be modified or deleted.  Please keep in mind
that this is a preview release and its use is not recommended for
production environments.  The next preview release, 3.7.0a2, is planned
for 2017-10-16. You can find Python 3.7.0a1 and more information here:
    https://www.python.org/downloads/release/python-370a1/

3. Python 3.3.7 is also now available.  It is a security-fix source-only release
and is expected to be the final release of any kind for Python 3.3.x before it
reaches end-of-life status on 2017-09-29, five years after its initial release.
Because 3.3.x has long been in security-fix mode, 3.3.7 may no longer build
correctly on all current operating system releases and some tests may fail.
If you are still using Python 3.3.x, we strongly encourage you to upgrade
now to a more recent, fully supported version of Python 3.  You can find
Python 3.3.7 here:
   https://www.python.org/downloads/release/python-337/

September 19, 2017 09:53 PM


DataCamp

Jupyter Notebook Cheat Sheet

You’ll probably already know the Jupyter notebooks pretty well - it’s probably one of the most well-known parts of the Jupyter ecosystem! If you haven’t explored the ecosystem yet or if you simply want to know more about it, don’t hesitate to go and explore it here!.

For those who are new to Project Jupyter, the Jupyter Notebook Application produces documents that contain a mix of executable code, text elements, and even HTML, which makes it thé ideal place to bring together an analysis description and its results as well as to perform data analysis in real time. This, combined with its many useful functionalities, explains why the Jupyter Notebook is a one of the data scientist’s preferred development environments that allows for interactive, reproducible data science analysis, computation and communication.

One of the other great things about the Jupyter notebooks?

They’re super easy to get started with! You might have already noticed this when you read DataCamp’s definitive guide to Jupyter Notebook. However, when you first enter in the application, you might have to find your way around the variety of functionalities that are presented to you: from saving your current notebook to adding or moving cells in the notebook or embedding current widgets in your notebook - without a doubt, there’s a lot out there to discover when you first get started!

That’s why DataCamp made a Jupyter Notebook cheat sheet for those who are just starting out and that want to have some help to find their way around.

Note also that the Jupyter Notebook Application has a handy “Help” menu that includes a full-blown User Interface Tour! – No worries, we also included this in the cheat sheet :)

Check it out here:

Jupyter Notebook cheat sheet

In short, this cheat sheet will help you to kickstart your data science projects, however small or big they might be: with of some screenshots and explanations, you’ll be a Jupyter Notebook expert in no time!

So what are you waiting for? Time to get started!

September 19, 2017 02:41 PM


Continuum Analytics Blog

What to Do When Things Go Wrong in Anaconda

Below is a question that was recently asked on StackOverflow and I decided it would be helpful to publish an answer explaining the various ways in which to troubleshoot a problem you may be having in Anaconda.

September 19, 2017 02:00 PM


Python Data

Python and AWS Lambda – A match made in heaven

In recent months, I’ve begun moving some of my analytics functions to the cloud. Specifically, I’ve been moving them many of my python scripts and API’s to AWS’ Lambda platform using the Zappa framework.  In this post, I’ll share some basic information about Python and AWS Lambda…hopefully it will get everyone out there thinking about new ways to use platforms like Lambda.

Before we dive into an example of what I’m moving to Lambda, let’s spend some time talking about Lambda. When I first heard about, I was a confused…but once I ‘got’ it, I saw the value. Here’s the description of Lambda from AWS’ website:

AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume – there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service – all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.

Once I realized how easy it is to move code to lambda to use whenever/wherever I needed it, I jumped at the opportunity.  But…it took a while to get a good workflow in place to simplify deploying to lambda. I stumbled across Zappa and couldn’t be happier…it makes deploying to lambda simple (very simple).

OK.  So. Why would you want to move your code to Lambda?

Lots of reasons. Here’s a few:

There are many other more sophisticated reasons of course, but these’ll do for now.

Let’s get started looking at python and AWS Lambda.  You’ll need an AWS account for this.

First – I’m going to talk a bit about building an API endpoint using Flask. You don’t have to use flask, but its an easy framework to use and you can quickly build an API endpoint with it with very little fuss.  With this example, I’m going to use Lambda to host an API endpoint that uses the Newspaper library to scrape a website, pull down the text and return that text to my local script.

Writing your first Flask + Lambda API

To get started, install Flask,Flask-Restful and Zappa.  You’ll want to do this in a fresh environment using virtualenv (see my previous posts about virtualenv and vagrant) because we’ll be moving this up to Lambda using Zappa.

pip install flask flask_restful zappa

Our flask driven API is going to be extremely simple and exist in less than 20 lines of code:

from flask import Flask
from newspaper import Article
from flask_restful import Resource, Api

app = Flask(__name__)
api = Api(app)


class hello(Resource):
    def get(self):
       return "Hello World"

api.add_resource(hello, '/hello')

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=5001)

Note: The ‘host = 0.0.0.0’ and ‘port=50001’ are extranous and are how I use Flask with vagrant. If you keep this in and run it locally, you’d need to visit http://0.0.0.0:5001 to view your app.

The last thing you need to do is build your requirements.txt file for Zappa to use when building your application files to send to Lambda. For a quick/dirty requirements file, I used the following:

zappa
newspaper
flask
flask_restful

Now…let’s get this up to lambda.  With zappa, its as easy as a couple of command line instructions.

First, run the init command from the command line in your virtualenv:

zappa init

You should see something similar to this:

zappa init screenshot

You’ll be asked a few questions, you can hit ‘enter’ to take the defaults or enter your own. For this eample, I used ‘dev’ for the environment name (you can set up multiple environments for dev, staging, production, etc) and made a S3 bucket for use with this application.

Zappa should realize you are working with Flask app and automatically set things up for you. It will ask you what the name of your Flask app’s main function is (in this case it is api.app). Lastly, Zappa will ask if you want to deploy to all AWS regions…I chose not to for this example. Once complete, you’ll have a zappa_settings.json file in your directory that will look something like the following:

{
    "dev": {
        "app_function": "api.app", 
        "profile_name": "default", 
        "s3_bucket": "DEV_BUCKET_NAME" #I removed the S3 bucket name for security purposes
    }
}

I’ve found that I need to add more information to this json file before I can successfully deploy. For some reason, Zappa doesn’t add the “region” to the settings file. I also like to add the “runtime” as well. Edit your json file to read (feel free to use whatever region you want):

{
    "dev": {
        "app_function": "api.app", 
        "profile_name": "default", 
        "s3_bucket": "DEV_BUCKET_NAME",
        "runtime": "python2.7",
        "aws_region": "us-east-1"
    }
}

Now…you are ready to deploy. You can do that with the following command:

zappa deploy dev

Zappa will set up all the necessary configurations and systems on AWS AND zip up your libraries and code and push it to Lambda.   I’ve not found another framework as easy to use as Zappa when it comes to deploying…if you know of one feel free to leave a comment.

After a minute or two, you should see a “Deployment Complete: …” message that includes the endpoint for your new API. In this case, Zappa built the following endpoint for me:

https://4wq2muonbb.execute-api.us-east-1.amazonaws.com/dev

If you make some changes to your code and need to update Lambda, Zappa makes it easy to do that with the following command:

zappa update dev

Additionally, if you want to add a ‘production’ lambda environment, all you need to do is add that new environment to your settings json file and deploy it. For this example, our settings file would change to:

{
    "dev": {
        "app_function": "api.app", 
        "profile_name": "default", 
        "s3_bucket": "DEV_BUCKET_NAME",
        "runtime": "python2.7",
        "aws_region": "us-east-1"
    }.
    "prod": {
        "app_function": "api.app", 
        "profile_name": "default", 
        "s3_bucket": "PROD_BUCKET_NAME",
        "runtime": "python2.7",
        "aws_region": "us-east-1"
    }
}

Next, do a deploy prod and your production environment is ready to go at a new endpoint.

zappa deploy prod

Interfacing with the API

Our code is pushed to Lambda and ready to start accepting requests.  In this example’s case, all we are doing is returning “hello world” but you can see the power in this for other functionality.  To check out the results, just open a browser and enter your Zappa Deployment URL and append /hello to the end of it like this:

https://4wq2muonbb.execute-api.us-east-1.amazonaws.com/dev/hello

You should see the standard “Hello World” response in your browser window.

You can find the code for the lambda api.py function here.

Note: At some point, I’ll pull this endpoint down…but will leave it up for a bit for users to play around with.

 

The post Python and AWS Lambda – A match made in heaven appeared first on Python Data.

September 19, 2017 01:42 PM


Python Piedmont Triad User Group

PYPTUG monthly meeting: Plotly, dash and company

Come join PYPTUG at out next monthly meeting (September 19th 2017) to learn more about the Python programming language, modules and tools. Python is the ideal language to learn if you've never programmed before, and at the other end, it is also a tool that no expert would do without. Monthly meetings are in addition to our project nights.



What

Meeting will start at 6:00pm.

Main Talk: "Plotly, dash and company"

by Francois Dion

Remake of W. Playfair's classic visualization (source: Plot.ly)

Abstract:

There are many visualization packages available out there, each best suited to specific scenarios. In the past several years, I've covered Matplotlib, Seaborn, Vincent, ggplot, 3d visualizations through matplotlib, D3.js and mpld3 and Bokeh. In this presentation we will cover plotly (for javascript, R and Python) and related packages and when it makes sense to use it.


Bio:


Francois Dion is the founder and Chief Data Scientist of Dion Research LLC, specializing in analytics, data science, IoT and visualization. 

He is the author of several open source software, such as stemgraphic (www.stemgraphic.org), the founder of the Python user group for the Piedmont Triad of North Carolina (www.pyptug.org) and mentors various groups in Python, R and analytics at large. You might have run across his multiple part series on LinkedIn on data science books, including part V on Visualization.

When:

Please note, this meeting will be one week early in the month compared to our normal schedule:

Tuesday, September 19th 2017
Meeting starts at 6:00PM

Where:
Wake Forest University, close to Polo Rd and University Parkway:
Manchester Hall
Wake Forest University, Winston-Salem, NC 27109



And speaking of parking:  Parking after 5pm is on a first-come, first-serve basis.  The official parking policy is:
"Visitors can park in any general parking lot on campus. Visitors should avoid reserved spaces, faculty/staff lots, fire lanes or other restricted area on campus. Frequent visitors should contact Parking and Transportation to register for a parking permit."

Mailing List:

Don't forget to sign up to our user group mailing list:


It is the only step required to become a PYPTUG member.

Please RSVP so we have enough food for people attending!
RSVP on meetup:
https://www.meetup.com/PYthon-Piedmont-Triad-User-Group-PYPTUG/events/242721091/

September 19, 2017 08:48 AM


S. Lott

Three Unsolvable Problems in Computing

The three unsolvable problems in computing:


Let's talk about naming.

The project team decided to call the server component "FlaskAPI".

Seriously.

It serves information about two kinds of resources: images and running instances of images. (Yes, it's a kind of kubernetes/dockyard lite that gives us a lot of control over servers with multiple containers.)

The feature set is growing rapidly. The legacy name needs to change. As we move forward, we'll be adding more microservices. Unless they have a name that reflects the resource(s) being managed, this is rapidly going to become utterly untenable.

Indeed, the name chosen may already be untenable: the name doesn't reflect the resource, it reflects an implementation choice that is true of all the microservices. (It's a wonder they didn't call it "PythonFlaskAPI".)

See https://blogs.mulesoft.com/dev/api-dev/best-practices-for-building-apis/ for some general guidelines on API design.

These guidelines don't seem to address naming in any depth. There are a few blog posts on this, but there seem to be two extremes.

I'm not a fan of an orchestration layer. But there's this: https://medium.com/capital-one-developers/microservices-when-to-react-vs-orchestrate-c6b18308a14c  tl;dr: Orchestration is essentially unavoidable.

There are articles on choreography. https://specify.io/concepts/microservices the idea is that an event queue is used to choreograph among microservices. This flips orchestration around a little bit by having a more peer-to-peer relationship among services. It replaces complex orchestration with a message queue, reducing the complexity of the code.

On the one hand, orchestration is simple. The orchestrator uses the resource class and content-type version information to find the right server. It's not a lot of code.

On the other hand, orchestration is overhead. Each request passes through two services to get something done. The pace of change is slow. HATEOAS suggests that a "configuration" or "service discovery" service (with etags to support caching and warning of out-of-date cache) might be a better choice. Clients can make a configuration request, and if cache is still valid, it can then make the real working request.

The client-side overhead is a burden that is -- perhaps -- a bad idea. It has the potential to make  the clients very complex. It can work if we're going to provide a sophisticated client library. It can't work if we're expecting developers to make RESTful API requests to get useful results. Who wants to make the extra meta-request all the time?

September 19, 2017 08:07 AM


Talk Python to Me

#130 10 books Python developers should be reading

One of the hallmarks of successful developers is continuous learning. The best developers I know don't just keep learning, it's one of the things that drives them. That's why I'm excited to bring you this episode on 10 books Python developers should read.

September 19, 2017 08:00 AM


Catalin George Festila

The numba python module - part 002 .

Today I tested how fast is jit from numba python and fibonacci math function.
You will see strange output I got for some values.
First example:

import numba
from numba import jit
from timeit import default_timer as timer

def fibonacci(n):
a, b = 1, 1
for i in range(n):
a, b = a+b, a
return a
fibonacci_jit = jit(fibonacci)

start = timer()
fibonacci(100)
duration = timer() - start

startnext = timer()
fibonacci_jit(100)
durationnext = timer() - startnext

print(duration, durationnext)
The result of this run is:
C:\Python27>python numba_test_003.py
(0.00018731270733896962, 0.167499256682878)

C:\Python27>python numba_test_003.py
(1.6357787798437412e-05, 0.1683614083221368)

C:\Python27>python numba_test_003.py
(2.245186560569841e-05, 0.1758382003097716)

C:\Python27>python numba_test_003.py
(2.3093347480146938e-05, 0.16714964906130353)

C:\Python27>python numba_test_003.py
(1.5395564986764625e-05, 0.17471143739730277)

C:\Python27>python numba_test_003.py
(1.5074824049540363e-05, 0.1847134227837042)
As you can see the fibonacci function is not very fast.
The jit - just-in-time compile is very fast.
Let's see if the python source code may slow down.
Let's see the new source code with jit will not work well:
import numba
from numba import jit
from timeit import default_timer as timer

def fibonacci(n):
a, b = 1, 1
for i in range(n):
a, b = a+b, a
return a
fibonacci_jit = jit(fibonacci)

start = timer()
print fibonacci(100)
duration = timer() - start

startnext = timer()
print fibonacci_jit(100)
durationnext = timer() - startnext

print(duration, durationnext)
The result is this:
C:\Python27>python numba_test_003.py
927372692193078999176
1445263496
(0.0002334994022992635, 0.17628787910376)

C:\Python27>python numba_test_003.py
927372692193078999176
1445263496
(0.0006886307922204926, 0.17579169287387408)

C:\Python27>python numba_test_003.py
927372692193078999176
1445263496
(0.0008105123483657127, 0.18209553525407973)

C:\Python27>python numba_test_003.py
927372692193078999176
1445263496
(0.00025466830415606486, 0.17186550306131188)

C:\Python27>python numba_test_003.py
927372692193078999176
1445263496
(0.0007348174871807866, 0.17523103771560608)
The result for value 100 is not the same: 927372692193078999176 and 1445263496.
First problem is:
The problem is that numba can't intuit the type of lookup. If you put a print nb.typeof(lookup) in your method, you'll see that numba is treating it as an object, which is slow.
The second problem is the output but can be from same reason.
I test with value 5 and the result is :
C:\Python27>python numba_test_003.py
13
13
13
13
(0.0007258367409385072, 0.17057997338491704)

C:\Python27>python numba_test_003.py
13
13
(0.00033709872502270044, 0.17213235952108247)

C:\Python27>python numba_test_003.py
13
13
(0.0004836773333341886, 0.17184433415945508)

C:\Python27>python numba_test_003.py
13
13
(0.0006854233828482501, 0.17381272129120037)

September 19, 2017 05:48 AM

The numba python module - part 001 .

Today I tested the numba python module.
This python module allow us to speed up applications with high performance functions written directly in Python.
The numba python module works by generating optimized machine code using the LLVM compiler infrastructure at import time, runtime, or statically.
The code can be just-in-time compiled to native machine instructions, similar in performance to C, C++ and Fortran.
For the installation I used pip tool:

C:\Python27>cd Scripts

C:\Python27\Scripts>pip install numba
Collecting numba
Downloading numba-0.35.0-cp27-cp27m-win32.whl (1.4MB)
100% |################################| 1.4MB 497kB/s
...
Installing collected packages: singledispatch, funcsigs, llvmlite, numba
Successfully installed funcsigs-1.0.2 llvmlite-0.20.0 numba-0.35.0 singledispatch-3.4.0.3

C:\Python27\Scripts>pip install numpy
Requirement already satisfied: numpy in c:\python27\lib\site-packages
The example test from official website working well:
The example source code is:
from numba import jit
from numpy import arange

# jit decorator tells Numba to compile this function.
# The argument types will be inferred by Numba when function is called.
@jit
def sum2d(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result

a = arange(9).reshape(3,3)
print(sum2d(a))
The result of this run python script is:
C:\Python27>python.exe numba_test_001.py
36.0
Another example using just-in-time compile is used with Numba’s jit function:
import numba
from numba import jit

def fibonacci(n):
a, b = 1, 1
for i in range(n):
a, b = a+b, a
return a

print fibonacci(10)

fibonacci_jit = jit(fibonacci)
print fibonacci_jit(14)
Also you can use jit is as a decorator:
@jit
def fibonacci_jit(n):
a, b = 1, 1
for i in range(n):
a, b = a+b, a

return a
Numba is a complex python module because use compiling.
First, compiling takes time, but will work specially for small functions.
The Numba python module tries to do its best by caching compilation as much as possible though.
Another note: not all code is compiled equal.

September 19, 2017 05:17 AM

The beauty of Python: subprocess module - part 004 .

This series of python tutorials that we started at the beginning of this blog and called "The beauty of Python" is part of the series of tutorials aimed at the simplicity and beauty of the python programming language.
The main goal for us is how to use this programming language in everyday life with different tasks.
Today I will come up with examples to cover this goal and show you how to use the subprocess python module.

September 19, 2017 05:17 AM


Yasoob Khalid

13 Python libraries to keep you busy

Hi guys! I was recently contacted by folks from AppDynamics (a part of CISCO). They shared an infographic with me which listed 13 Python libraries. These libraries were categorized in sections. I loved going through that infographic. I hope you guys will enjoy it too. 

Source: AppDynamics


September 19, 2017 01:22 AM


Daniel Bader

Contributing to Python Open-Source Projects

Contributing to Python Open-Source Projects

How can you become a contributor on popular, “high-profile” Python open-source projects like Django, Requests, and so on?

Contributing to open-source projects is a great way to build your programming skills, take part in the community, and to make a real impact with your code…

It can also help you get a job as a professional Python developer, but becoming a contributor in the first place—that’s often tough.

So, let’s talk about this question I got from newsletter member Sudhanshu the other day:

Hi Dan,

I am student from India, I don’t really know whether this is your field or not but I have been doing Django development for 5 to 6 months.

I have made few projects on REST APIs, websites, etc. Then I decided to contribute in Django open source projects, particularly those by the Django organization and Mozilla.

What should I do at this point? How can I improve my level of Python knowledge so that I can contribute to these projects?

It sounds like Sudhanshu is in a good spot already.

I love the fact that he’s been working on his own side-projects to build up a portfolio—that’ll be a great asset when he goes job hunting.

If you’re in Sudhanshu’s shoes right now, here’s what I’d focus on next:

Try to strike up some personal connections with people working on those “high-profile” Python projects you want to contribute to.

See if you can make contact somehow—are they on Twitter? Can you comment or ask a question on a GitHub issue? Maybe you can even cold-email them…

Little by little, you’ll be able to build relationships with some of them. Building trust takes a lot of time and dedication, but eventually the timing will be right to offer your help:

Just ask them if there’s something small you could contribute to, like cleaning up the documentation, or fixing typos—simple things like that.

Open-source maintainers usually appreciate it when others help improve the documentation of a project. So that’s often a good way for you to get the foot in the door, metaphorically speaking.

What I want to say is this:

Getting your contributions accepted comes down much more to having built trust with the right people, rather than “throwing a bunch of code over the wall” and creating random pull-requests.

If you’re interested in some more thoughts on this topic, check out the YouTube video I recorded. It contains additional tips and tactics that will help you break into the open-source world:

» Subscribe to the dbader.org YouTube Channel for more Python tutorials.

Good luck on your Python open-source journey and…Happy Pythoning!

September 19, 2017 12:00 AM

September 18, 2017


Carl Chenet

The Github threat

Many voices arise now and then against risks linked to the Github use by Free Software projects. Yet the infatuation for the collaborative forge of the Octocat Californian start-ups doesn’t seem to fade away.

These recent years, Github and its services take an important role in software engineering as they are seen as easy to use, efficient for a daily workload with interesting functions in enterprise collaborative workflow or amid a Free Software project. What are the arguments against using its services and are they valid? We will list them first, then we’ll examine their validity.

1. Critical points

1.1 Centralization

The Github application belongs to a single entity, Github Inc, a US company which manage it alone. So, a unique company under US legislation manages the access to most of Free Software application code sources, which may be a problem with groups using it when a code source is no longer available, for political or technical reason.

The Octocat, the Github mascot

 

This centralization leads to another trouble: as it obtained critical mass, it becomes more and more difficult not having a Github account. People who don’t use Github, by choice or not, are becoming a silent minority. It is now fashionable to use Github, and not doing so is seen as “out of date”. The same phenomenon is a classic, and even the norm, for proprietary social networks (Facebook, Twitter, Instagram).

1.2 A Proprietary Software

When you interact with Github, you are using a proprietary software, with no access to its source code and which may not work the way you think it is. It is a problem at different levels. First, ideologically, but foremost in practice. In the Github case, we send them code we can control outside of their interface. We also send them personal information (profile, Github interactions). And mostly, Github forces any project which goes through the US platform to use a crucial proprietary tools: its bug tracking system.

Windows, the epitome of proprietary software, even if others took the same path

 

1.3 The Uniformization

Working with Github interface seems easy and intuitive to most. Lots of companies now use it as a source repository, and many developers leaving a company find the same Github working environment in the next one. This pervasive presence of Github in free software development environment is a part of the uniformization of said developers’ working space.

Uniforms always bring Army in my mind, here the Clone army

2 – Critical points cross-examination

2.1 Regarding the centralization

2.1.1 Service availability rate

As said above, nowadays, Github is the main repository of Free Software source code. As such it is a favorite target for cyberattacks. DDOS hit it in March and August 2015. On December 15, 2015, an outage led to the inaccessibility of 5% of the repositories. The same occurred on November 15. And these are only the incident reported by Github itself. One can imagine that the mean outage rate of the platform is underestimated.

2.1.2 Chain reaction could block Free Software development

Today many dependency maintenance tools, as npm for javascript, Bundler for Ruby or even pip for Python can access an application source code directly from Github. Free Software projects getting more and more linked and codependents, if one component is down, all the developing process stop.

One of the best examples is the npmgate. Any company could legally demand that Github take down some source code from its repository, which could create a chain reaction and blocking the development of many Free Software projects, as suffered the Node.js community from the decisions of Npm, Inc, the company managing npm.

2.2 A historical precedent: SourceForge

Github didn’t appear out of the blue. In his time, its predecessor, SourceForge, was also extremely popular.

Heavily centralized, based on strong interaction with the community, SourceForge is now seen as an aging SAAS (Software As A Service) and sees most of its customers fleeing to Github. Which creates lots of hurdles for those who stayed. The Gimp project suffered from spams and terrible advertising, which led to the departure of the VLC project, then from installers corrupted with adwares instead of the official Gimp installer for Windows. And finally, the Project Gimp’s SourceForge account was hacked by… SourceForge team itself!

These are very recent examples of what can do a commercial entity when it is under its stakeholders’ pressure. It is vital to really understand what it means to trust them with data and exchange centralization, where it could have tremendous repercussion on the day-to-day life and the habits of the Free Software and open source community.

2.3. Regarding proprietary software

2.3.1 One community, several opinions on proprietary software

Mostly based on ideology, this point deals with the definition every member of the community gives to Free Software and open source. Mostly about one thing: is it viral or not? Or GPL vs MIT/BSD.

Those on the side of the viral Free Software will have trouble to use a proprietary software as this last one shouldn’t even exist. It must be assimilated, to quote Star Trek, as it is a connected black box, endangering privacy, corrupting for profit our uses and restrain our freedom to use as we’re pleased what we own, etc.

Those on the side of complete freedom have no qualms using proprietary software as their very existence is a consequence of freedom without restriction. They even agree that code they developed may be a part of proprietary software, which is quite a common occurrence. This part of the Free Software community has no qualm using Github, which is well within their ideology parameters. Just take a look at the Janson amphitheater during Fosdem and check how many Apple laptops running on macOS are around.

FreeBSD, the main BSD project under the BSD license

2.3.2 Data loss and data restrictions linked to proprietary software use

Even without ideological consideration, and just focusing on Github infrastructure, the bug tracking system is a major problem by itself.

Bug report builds the memory of Free Software projects. It is the entrance point for new contributors, the place to find bug reporting, requests for new functions, etc. The project history can’t be limited only to the code. It’s very common to find bug reports when you copy and paste an error message in a search engine. Not their historical importance is precious for the project itself, but also for its present and future users.

Github gives the ability to extract bug reports through its API. What would happen if Github is down or if the platform doesn’t support this feature anymore? In my opinion, not that many projects ever thought of this outcome. How could they move all the data generated by Github into a new bug tracking system?

One old example now is Astrid, a TODO list bought by Yahoo a few years ago. Very popular, it grew fast until it was closed overnight, with only a few weeks for its users to extract their data. It was only a to-do list. The same situation with Github would be tremendously difficult to manage for several projects if they even have the ability to deal with it. Code would still be available and could still live somewhere else, but the project memory would be lost. A project like Debian has today more than 800,000 bug reports, which are a data treasure trove about problems solved, function requests and where the development stand on each. The developers of the Cpython project have anticipated the problem and decided not to use Github bug tracking systems.

Issues, the Github proprietary bug tracking system

Another thing we could lose if Github suddenly disappear: all the work currently done regarding the push requests (aka PRs). This Github function gives the ability to clone one project’s Github repository, to modify it to fit your needs, then to offer your own modification to the original repository. The original repository’s owner will then review said modification, and if he or she agrees with them will fuse them into the original repository. As such, it’s one of the main advantages of Github, since it can be done easily through its graphic interface.

However reviewing all the PRs may be quite long, and most of the successful projects have several ongoing PRs. And this PRs and/or the proprietary bug tracking system are commonly used as a platform for comment and discussion between developers.

Code itself is not lost if Github is down (except one specific situation as seen below), but the peer review works materialized in the PRs and the bug tracking system is lost. Let’s remember than the PR mechanism let you clone and modify projects and then generate PRs directly from its proprietary web interface without downloading a single code line on your computer. In this particular case, if Github is down, all the code and the work in progress is lost.

Some also use Github as a bookmark place. They follow their favorite projects’ activity through the Watch function. This technological watch style of data collection would also be lost if Github is down.

Debian, one of the main Free Software projects with at least a thousand official contributors

2.4 Uniformization

The Free Software community is walking a thigh rope between normalization needed for an easier interoperability between its products and an attraction for novelty led by a strong need for differentiation from what is already there.

Github popularized the use of Git, a great tool now used through various sectors far away from its original programming field. Step by step, Git is now so prominent it’s almost impossible to even think to another source control manager, even if awesome alternate solutions, unfortunately not as popular, exist as Mercurial.

A new Free Software project is now a Git repository on Github with README.md added as a quick description. All the other solutions are ostracized? How? None or very few potential contributors would notice said projects. It seems very difficult now to encourage potential contributors into learning a new source control manager AND a new forge for every project they want to contribute. Which was a basic requirement a few years ago.

It’s quite sad because Github, offering an original experience to its users, cut them out of a whole possibility realm. Maybe Github is one of the best web versioning control systems. But being the main one doesn’t let room for a new competitor to grow. And it let Github initiate development newcomers into a narrow function set, totally unrelated to the strength of the Git tool itself.

3. Centralization, uniformization, proprietary software… What’s next? Laziness?

Fight against centralization is a main part of the Free Software ideology as centralization strengthens the power of those who manage it and who through it control those who are managed by it. Uniformization allergies born against main software companies and their wishes to impose a closed commercial software world was for a long time the main fuel for innovation thirst and intelligent alternative development. As we said above, part of the Free Software community was built as a reaction to proprietary software and their threat. The other part, without hoping for their disappearance, still chose a development model opposite to proprietary software, at least in the beginning, as now there’s more and more bridges between the two.

The Github effect is a morbid one because of its consequences: at least centralization, uniformization, proprietary software usage as their bug tracking system. But some years ago the Dear Github buzz showed one more side effect, one I’ve never thought about: laziness. For those who don’t know what it is about, this letter is a complaint from several spokespersons from several Free Software projects which demand to Github team to finally implement, after years of polite asking, new functions.

Since when Free Software project facing a roadblock request for clemency and don’t build themselves the path they need? When Torvalds was involved in the Bitkeeper problem and the Linux kernel development team couldn’t use anymore their revision control software, he developed Git. The mere fact of not being able to use one tool or functions lacking is the main motivation to seek alternative solutions and, as such, of the Free Software movement. Every Free Software community member able to code should have this reflex. You don’t like what Github offers? Switch to Gitlab. You don’t like it Gitlab? Improve it or make your own solution.

The Gitlab logo

Let’s be crystal clear. I’ve never said that every Free Software developers blocked should code his or her own alternative. We all have our own priorities, and some of us even like their beauty sleep, including me. But, to see that this open letter to Github has 1340 names attached to it, among them some spokespersons for major Free Software project showed me that need, willpower and strength to code a replacement are here. Maybe said replacement will be born from this letter, it would be the best outcome of this buzz.

In the end, Github usage is just another example of Internet usage massification. As Internet users are bound to go to massively centralized social network as Facebook or Twitter, developers are following the same path with Github. Even if a large fraction of developers realize the threat linked this centralized and proprietary organization, the whole community is following this centralization and uniformization trend. Github service is useful, free or with a reasonable price (depending on the functions you need) easy to use and up most of the time. Why would we try something else? Maybe because others are using us while we are savoring the convenience? The Free Software community seems to be quite sleepy to me.

The lion enjoying the hearth warm

About Me

Carl Chenet, Free Software Indie Hacker, founder of the French-speaking Hacker News-like Journal du hacker.

Follow me on social networks

Translated from French by Stéphanie Chaptal. Original article written in 2015.

September 18, 2017 10:00 PM


NumFOCUS

The Econ-ARK joins NumFOCUS Sponsored Projects

​NumFOCUS is pleased to announce the addition of the Econ-ARK to our fiscally sponsored projects. As a complement to the thriving QuantEcon project (also a NumFOCUS sponsored project), the Econ-ARK is creating an open-source resource containing the tools needed to understand how diversity across economic agents (in preferences, circumstances, knowledge, etc) leads to richer and […]

September 18, 2017 05:06 PM


Zato Blog

Building a protocol-agnostic API for SMS text messaging with Zato and Twilio

This blog post discusses an integration scenario that showcases a new feature in Zato 3.0 - SMS texting with Twilio.

Use-case

Suppose you'd like to send text messages that originate from multiple sources, from multiple systems communicating natively over different protocols, such as the most commonly used ones:

Naturally, the list could grow but the main points are that:

The solution is to route all the messages through a dedicated Zato service that will:

Let's say that the system that we are building will send text messages informing customers about the availability of their order. For simplicity, only REST and AMQP will be shown below but the same principle will hold for other protocols that Zato supports.

Screenshots

Code

# -*- coding: utf-8 -*-

from __future__ import absolute_import, division, print_function, unicode_literals

# Zato
from zato.server.service import Service

class SMSAdapter(Service):
    """ Sends template-based text messages to users given on input.
    """
    name = 'sms.adapter'

    class SimpleIO:
        input_required = ('user_name', 'order_no')

    def get_phone_number(self, user_name):
        """ Returns a phone number by user_name.+1234567890
        In practice, this would be read from a database or cache.
        """
        users = {
            'mary.major': '+15550101',
            'john.doe': '+15550102',
        }

        return users[user_name]

    def handle(self):

        # In a real system there would be more templates,
        # perhaps in multiple natural languages, and they would be stored in files
        # on disk instead of directly in the body of a service.
        template = "Hello, we are happy to let you know that" \
            "your order #{order_no} is ready for pickup."

        # Get phone number from DB
        phone_number = self.get_phone_number(self.request.input.user_name)

        # Convert the template to an actual message
        msg = template.format(order_no=self.request.input.order_no)

        # Get connection to Twilio
        sms = self.out.sms.twilio.get('My SMS')

        # Send messages
        sms.conn.send(msg, to=phone_number)

In reality, the code would contain more logic, for instance to look up users in an SQL or Cassandra database or to send messages based on different templates but to illustrate the point, the service above will suffice.

Note that the service uses SimpleIO which means it can be used with both JSON and XML even if only the former is used in the example.

This is the only piece of code needed and the rest is simply configuration in web-admin described below.

Channel configuration

The service needs to be mounted on channels - in this scenario it will be HTTP/REST and AMQP ones but could be any other as required in a given integration project.

In all cases, however, no changes to the code are needed in order to support additional protocols - assigning a service to a channel is merely a matter of additional configuration without any coding.

Screenshots

Screenshots

Screenshots

Screenshots

Twilio configuration

Fill out the form in Connections -> SMS -> Twilio to get a new connection to Twilio SMS messaging facilities. Account SID and token are the same values that you are given by Twilio for your account.

Default from is useful if you typically send messages from the same number or nickname. On the other hand, default to is handy if the recipient is usually the same for all messages sent. Both of these values can be always overridden on a per call basis.

Screenshots

Screenshots

Invocation samples

We can now invoke the service from both curl and RabbitMQ's GUI:

Screenshots

Screenshots

Result

In either case, the result is a text message delivered to the intended recipient :-)

Screenshots

There is more!

It is frequently very convenient to test connections without actually having to develop any code - this is why SMS Twilio connections offer a form to do exactly that. Just click on 'Send a message', fill in your message, click Submit and you're done!

Screenshots

Screenshots

Summary

Authoring API services, including ones that send text messages with Twilio is an easy matter with Zato.

Multiple input protocols are supported out of the box and you can rest assured that API keys and other credentials never sprawl all throughout the infrastructure, everything is contained in a single place.

September 18, 2017 04:48 PM


DataCamp

How Not To Plot Hurricane Predictions

Visualizations help us make sense of the world and allow us to convey large amounts of complex information, data and predictions in a concise form. Expert predictions that need to be conveyed to non-expert audiences, whether they be the path of a hurricane or the outcome of an election, always contain a degree of uncertainty. If this uncertainty is not conveyed in the relevant visualizations, the results can be misleading and even dangerous.

Here, we explore the role of data visualization in plotting the predicted paths of hurricanes. We explore different visual methods to convey the uncertainty of expert predictions and the impact on layperson interpretation. We connect this to a broader discussion of best practices with respect to how news media outlets report on both expert models and scientific results on topics important to the population at large.

No Spaghetti Plots?

We have recently seen the damage wreaked by tropical storm systems in the Americas. News outlets such as the New York Times have conveyed a great deal of what has been going on using interactive visualizations for Hurricanes Harvey and Irma, for example. Visualizations include geographical visualisation of percentage of people without electricity, amount of rainfall, amount of damage and number of people in shelters, among many other things.

One particular type of plot has understandably been coming up recently and raising controversy: how to plot the predicted path of a hurricane, say, over the next 72 hours. There are several ways to visualize predicted paths, each way with its own pitfalls and misconceptions. Recently, we even saw an article in Ars Technica called Please, please stop sharing spaghetti plots of hurricane models, directed at Nate Silver and fivethirtyeight.

In what follows, I'll compare three common ways, explore their pros and cons and make suggestions for further types of plots. I'll also delve into why these types are important, which will help us decide which visual methods and techniques are most appropriate.

Disclaimer: I am definitively a non-expert in metereological matters and hurricane forecasting. But I have thought a lot about visual methods to convey data, predictions and models. I welcome and actively encourage the feedback of experts, along with that of others.

Visualizing Predicted Hurricane Paths

There are three common ways of creating visualizations for predicted hurricane paths. Before talking about at them, I want you to look at them and consider what information you can get from each of them. Do your best to interpret what each of them is trying to tell you, in turn, and then we'll delve into what their intentions are, along with their pros and cons:

The Cone of Uncertainty

From the National Hurricane Center

Spaghetti Plots (Type I)

From South Florida Water Management District via fivethirtyeight

Spaghetti Plots (Type II)

From The New York Times. Surrounding text tells us 'One of the best hurricane forecasting systems is a model developed by an independent intergovernmental organization in Europe, according to Jeff Masters, a founder of the Weather Underground. The system produces 52 distinct forecasts of the storm’s path, each represented by a line [above].'

Interpretation and Impact of Visualizations of Hurricanes' Predicted Paths

The Cone of Uncertainty

The cone of uncertainty, a tool used by the National Hurricane Center (NHC) and communicated by many news outlets, shows us the most likely path of the hurricane over the next five days, given by the black dots in the cone. It also shows how certain they are of this path. As time goes on, the prediction is less certain and this is captured by the cone, in that there is an approximately 66.6% chance that the centre of the hurricane will fall in the bounds of the cone.

Was this apparent from the plot itself?

It wasn't to me initially and I gathered this information from the plot itself, the NHC's 'about the cone of uncertainty' page and weather.com's demystification of the cone post. There are three more salient points, all of which we'll return to:

  • It is a common initial misconception that the widening of the cone over time suggests that the storm will grow;
  • The plot contains no information about the size of the storm, only about the potential path of its centre, and so is of limited use in telling us where to expect, for example, hurricane-force winds;
  • There is essential information contained in the text that accompanies the visualization, as well as the visualization itself, such as the note placed prominently at the top, '[t]he cone contains the probable path of the storm center but does not show the size of the storm...'; when judging the efficacy of a data visualization, we'll need to take into consideration all its properties, including text (and whether we can actually expect people to read it!); note that interactivity is a property that these visualizations do not have (but maybe should).

Spaghetti Plots (Type I)

Type I spaghetti plots show several predictions in one plot. One any given Type I spaghetti plot, the visualized trajectories are predictions from models from different agencies (NHC, the National Oceanic and Atmospheric Administration and the UK Met Office, for example). They are useful in that, like the cone of uncertainty, they inform us of the general region that may be in the hurricane's path. They are wonderfully unuseful and actually misleading in the fact that they weight each model (or prediction) equally.

In the Type I spaghetti plot above, there are predictions with varying degrees of uncertaintly from agencies that have previously made predictions with variable degrees of success. So some paths are more likely than others, given what we currently know. This information is not present. Even more alarmingly, some of the paths are barely even predictions. Take the black dotted line XTRP, which is a straight-line prediction given the storm's current trajectory. This is not even a model. Eric Berger goes into more detail in this Ars Technica article.

Essentially, Type I spaghetti plots provide an ensemble model (compare with aggregate polling). Yet, a key aspect of ensemble models is that each model is given an appropriate weight and these weights need be communicated in any data visualization. We'll soon see how to do this using a variation on Type I.

Spaghetti Plots (Type II)

Type II spaghetti plots show many, say 50, different realizations of any given model. The point is that if we simulate (run) a model several times, it will given a different trajectory each time. Why? Nate Cohen put it well in The Upshot:

"It’s really tough to forecast exactly when a storm will make a turn. Even a 15- or 20-mile difference in when it turns north could change whether Miami is hit by the eye wall, the fierce ring of thunderstorms that include the storm’s strongest winds and surround the calmer eye."

These are perhaps my favourite of the three for several reasons:

  • By simulating multiple runs of the model, they provide an indication of the uncertainty underlying each model;
  • They give a picture of relative likelihood of the storm centre going through any given location. Put simply, if more of the plotted trajectories go through location A than through location B, then under the current model it is more likely that the centre of the storm will go through location A;
  • They are unlikely to be misinterpreted (at least compared to the cone of uncertainty and Type I spaghetti plots). All the words required on the visualization are 'Each line represents one forecast of Irma's path'.

One con of Type II is that they are not representative of multiple models but, as we'll see, this can be altered by combining them with Type I spaghetti plots. Another con is that they, like the others, only communicate the path of the centre of the storm and say nothing about its size. Soon we'll also see how we can remedy this. Note that the distinction between Type I and Type II spaghetti plots is not one that I have found in the literature, but one that I created because these plots have such different interpretations and effects.

For the time being, however, note that we've been discussing the efficacy of certain types of plots without explicitly discussing their purpose, that is, why we need them at all. Before going any further, let's step back a bit and try to answer the question 'What is the purpose of visualizing the predicted path of a hurricane?' Performing such ostensibly naive tasks is often illuminating.

Why Plot Predicted Paths of Hurricanes?

Why are we trying to convey the predicted path of a tropical storm? I'll provide several answers to this in a minute.

But first, let me say what these visualizations are not intended for. We are not using these visualizations to help people decide whether or not to evacuate their homes or towns. Ordering or advising evacuation is something that is done by local authorities, after repeated consultation with experts, scientists, modelers and other key stakeholders.

The major point of this type of visualization is to allow the general populace to be as well-informed as possible about the possible paths of the hurricane and allow them to prepare for the worst if there's a chance that where they are or will be is in the path of destruction. It is not to unduly scare people. As weather.com states with respect to the function of the cone of uncertainty, '[e]ach tropical system is given a forecast cone to help the public better understand where it's headed' and '[t]he cone is designed to show increasing forecast uncertainty over time.'

To this end, I think that an important property would be for a reader to be able to look at it and say 'it is very likely/likely/50% possible/not likely/very unlikely' that my house (for example) will be significantly damaged by the hurricane.

Even better, to be able to say "There's a 30-40% chance, given the current state-of-the-art modeling, that my house will be significantly damaged".

Then we have a hierarchy of what we want our visualization to communicate:

  • At a bare minimum, we want civilians to be aware of the possible paths of the hurricane.
  • Then we would like civilians to be able to say whether it is very likely, likely, unlikely or very unlikely that their house, for example, is in the path.
  • Ideally, a civilian would look at the visualization and be able to read off quantatively what the probability (or range of probabilities) of their house being in the hurricane's path is.

On top of this, we want our visualizations to be neither misleading nor easy to misinterpret.

The Cone of Uncertainty versus Spaghetti Plots

All three methods perform the minimum required function, to alert civilians to the possible paths of the hurricane. The cone of uncertainty does a pretty good job at allowing a civilian to say how likely it is that a hurricane goes through a particular location (within the cone, it's about two-thirds likely). At least qualitatively, Type II spaghetti plots also do a good job here, as described above, 'if more of the trajectories go through location A than through location B, then under the current model it is more likely that the centre of the storm will go through location A'.

If you plot 50 trajectories, you get a sense of where the centre of the storm will likely be, that is, if around half of the trajectories go through a location, then there's an approximately 50% chance (according to our model) that the centre of the storm will hit that location. None of these methods yet perform the 3rd function and we'll see below how combining Type I and Type II spaghetti plots will allow us to do this.

The major problem with the cone of uncertainty and Type I spaghetti models is that the cone of uncertainty is easy to misinterpret (in that many people interpret the cone as a growing storm and do not appreciate the role of uncertainty) and that the Type I spaghetti models are misleading (they make all models look equally believable). These models then don't satisfy the basic requirement that 'we want our visualizations to be neither misleading nor easy to misinterpret.'

Best Practices for Visualizing Hurricane Prediction Paths

Type II spaghetti plots are the most descriptive and the least open to misinterpretation. But they do fail at presenting the results of all models. That is, they don't aggregate over multiple models like we saw in Type I.

So what if we combined Type I and Type II spaghetti plots?

To answer this, I did a small experiment using python, folium and numpy. You can find all the code here.

I first took one the NHC's Hurricane Irma's prediction paths from last week, added some random noise and plotted 50 trajectories. Note that, once again, I am a non-expert in all matters meteorological. The noise that I generated and added to the predicted signal/path was not based on any models and, in a real use case, would come from the models themselves (if you're interested, I used Gaussian noise). For the record, I also found it difficult to find data concerning any of the predicted paths reported in the media. The data I finally used I found here.

Here's a simple Type II spaghetti plot with 50 trajectories:

But these are possible trajectories generated by a single model. What if we had multiple models from different agencies? Well, we can plot 50 trajectories from each:

One of the really cool aspects of Type II spaghetti plots is that, if we plot enough of them, each trajectory becomes indistinct and we begin to see a heatmap of where the centre of the hurricane is likely to be. All this means is that the more blue in a given region, the more likely it is for the path to go through there. Zoom in to check it out.

Moreover, if we believe that one model is more likely than another (if, for example, the experts who produced that model have produced far more accurate models previously), we can weight these models accordingly via, for example, transparency of the trajectories, as we do below. Note that weighting these models is a task for an expert and an essential part of this process of aggregate modeling.

What the above does is solve the tasks required by the first two properties that we want our visualizations to have. To achieve the 3rd, a reader being able to read off that it's, say 30-40% likely for the centre of a hurricane to pass through a particular location, there are two solutions:

  • to alter the heatmap so that it moves between, say, red and blue and include a key that says, for example, red means a probability of greater than 90%;
  • To transform the heatmap into a contour map that shows regions in which the probability takes on certain values.

Also do note that this will tell somebody the probability that a given location will be hit by the hurricane's center. You could combine (well, convolve) this with information about the size of the hurricane to transform the heatmap into one of the probability of a location being hit by hurricane-force winds. If you'd like to do this, go and hack around the code that I wrote to generate the plots above (I plan to write a follow-up post doing this and walking through the code).

Visualizing Uncertainty and Data Journalism

What can we take away from this? We have explored several types of visualization methods for predicted hurricane paths, discussed the pros and cons of each and suggested a way forward for more informative and less misleading plots of such paths, plots that communicate not only the results but also the uncertainty around the models.

This is part of a broader conversation that we need to be having about reporting uncertainty in visualizations and data journalism, in general. We need to actively participate in conversations about how experts report uncertainty to civilians via news media outlets. Here's a great piece from The Upshot demonstrating what the jobs report could look like due to statistical noise, even if jobs were steady. Here's another Upshot piece showing the role of noise and uncertainty in interpreting polls. I'm well aware that we need headlines to sell news and the role of click-bait in the modern news media landscape, but we need to be communicating not merely results, but uncertainty around those results so as not mislead the general public and potentially ourselves. Perhaps more importantly, the education system needs to shift and equip all civilians with levels of data literacy and statistical literacy in order to deal with this movement into the data-driven age. We can all contribute to this.

September 18, 2017 04:32 PM


Python Software Foundation

Improving Python and Expanding Access: How the PSF Uses Your Donation


The PSF is excited to announce its first ever membership drive beginning on September 18th!  Our goal for this inaugural drive is to raise $4,000.00 USD in donations and sign up 3,000 new members in 30 days.

If you’ve never donated to the PSF,  you've let your membership lapse, or you've thought about becoming a Supporting Member - here is your chance to make a difference.

Join the PSF as a Supporting Member or Donate to the PSF


You can donate as an individual or join the PSF as a Supporting Member. Supporting members pay $99.00 USD per year to help sustain the Foundation and support the Python community. Supporting members are also eligible to vote for candidates for the PSF Board of Directors, changes in the PSF bylaws, and other matters related to the infrastructure of the foundation.

To become a supporting member or to make a donation, click on the widget here and follow the instructions at the bottom of the page.

We know many of you already make a great effort to support us; you volunteer your time to help us keep our website going, you join working groups to help with marketing, sponsorship, grant requests, trademarks, Python education, and packaging. Even more, you help the PSF put on PyCon US, a conference we couldn’t do without the help of our volunteers. The collective efforts and contributions of our volunteers help drive our work. We will forever be grateful to the people who step forward and ask, “What can I do to help advance open source technology related to Python?”

We understand that not everyone has the time to volunteer, but perhaps you’re in a position to help financially.


We’re asking those who are able, to donate money to support sprints, meet ups, and community events. Donations support Python documentation, fiscal sponsorships, software development, and community projects. They help fund the critical tools programmers use every day.

If you're not in a position to contribute financially, that's ok. Basic membership is free and we welcome anyone who would like to join at this level. Register here to create your member account, log back in, then complete the form to become a basic member.


What does the PSF do?


Here is what one of our sponsors has to say about why they contribute to the PSF:

“Work on stuff that matters is one of O’Reilly’s core principles, and we know how very much open source matters. The open source community spurs innovation, shares knowledge, encourages growth, and creates industries. The Python Software Foundation is a prime example of the power of open source, showing how focused, thoughtful, and consistent efforts can create a community whose impact extends far beyond meetups and lines of code. O’Reilly is proud to continue to sponsor this great foundation.”
-- Rachel Roumeliotis, Vice President at O’Reilly Media and Chair of OSCON

Lastly, if you’d like to share the news about the PSF’s Membership drive, please share a tweet via the tweet button here:




Or share a tweet with the following text:

Donation & Membership Drive @ThePSF. Help us raise $4K and register 3K new members in 30 days! http://bit.ly/2h3dxpb #idonatedtothepsf


We at the PSF want to thank you for all that you do. Your support is what makes the PSF possible.

September 18, 2017 04:02 PM


Michy Alice

Putting some of my Python knowledge to a good use: a Reddit reading bot!

One of the perks of knowing a programming language is that you can build your own tools and applications. Depending on what you need, it may even be a fast process since you usually do not need to write production grade code and a detailed documentation (although it might still be helpful in the future).

I’ve got used to read news stuff on Reddit, however, it sometimes can be a bit time consuming since it tends to keep you wandering through every and each rabbit hole that pops up. This is fine if you are commuting and have just some spare time to spend on browsing the web but sometimes I just need a quick glance at what’s new and relevant to my interests.

In order to automate this search process, I’ve written a bot that takes as input a list of subreddits, a list of keywords and flags and browses each subreddit looking for the given keywords.

If a keyword is found inside either the body or in the title of a post which has been submitted in one of the selected subreddits, the post title and the links are either printed in the console or saved in a file (in this case the file name must be supplied when starting the search).

The bot is written using praw.


What do I need to use the bot?

In order to use the bot you’ll need to set up an app using your Reddit account and save the client_id, client_secret, username and password in a file named config_data.py which should be stored in the same folder as the reddit_browsing_bot_main.py script.

How does the bot work?

The bot is designed to be a command line application and can be used either in Linux terminal or in the PowerShell if you are the Windows type ;)

This choice of adopting a CLI was undoubtebly a bad choice if I wanted to make other people use the application but in my case I am the end user, and I like command line tools, a lot.

For each subreddit entered, the bot checks if that subreddit exists, if it doesn’t the subreddit is discarded. Then, within each subreddit, the bot searches the first –l posts and returns the posts that contained at least a keyword.

This is an example of use:

reddit_browsing_bot_main.py -s python -k pycon -l 80 -f new -o output.txt –v
In the example above I am searching the first 80 posts in the “new” section of the python 
subreddit for posts that mention pycon. The –o flag tells the program to output the results
of the search to the output.txt file. The –v flag makes the program print the output to the
console.
You can search in more subreddits and/or use more keywords, just separate each new
subreddit/keyword with a comma. If you did not supply an output file, the program will
just output the results to the console.
Type:
reddit_browsing_bot_main.py -h
for a help menu. Maybe in the future I’ll add some features but for now this is pretty much it. 

Is it ok to use the bot?

As far as I know, the bot is not violating any of the terms written in the Reddit’s API. Also,
the API calls are already limited by the praw module in order to comply with the Reddit’s
API limits. The bot is not downvoting nor upvoting any post, it just reads what is online.
Anyway, should you want to check the code yourself, it is available on my GitHub.
I’ve also copied and pasted a gist below so that you can have a look at the code here:

September 18, 2017 03:26 PM


Possbility and Probability

The curse of knowledge: Finding os.getenv()

Recently I was working with a co-worker on an unusual nginx problem. While working on the nginx issue we happened to look at some of my Python code. My co-worker normally does not do a lot of Python development, she … Continue reading

The post The curse of knowledge: Finding os.getenv() appeared first on Possibility and Probability.

September 18, 2017 02:24 PM


DataCamp

DataCamp and Springboard Are Working Together To Get You a Data Science Job!

DataCamp and Springboard are coming together to advance learning and career outcomes for aspiring data scientists.  

Joining forces was an obvious choice. Springboard’s human-centered approach to online learning perfectly complemented DataCamp’s expertise in interactive learning exercises. Together, we’ve created the Data Science Career Track, the first mentor-led data science bootcamp to come with a job guarantee. 

Each student in the Data Science Career Track will be assigned a personal industry mentor who’ll advise them on technical skills, project execution, and career advancement. Springboard’s expert-curated data science curriculum will be paired with DataCamp’s interactive exercises for a seamless learning experience. Finally, a career coach will work with students on interview skills, resume building, and personalized job searches to help them find the ideal data science position. 

The course is selective: about 18% of applicants are allowed to enroll after going through the admission process.

For eligible students, the course guarantees that you’ll find a job within six months after graduation or your money back.  

For a limited time only (until October 16th), you can use the code LOVEDATA to get $200 off if the Data Science Career Track. Click here for more information

September 18, 2017 01:18 PM


Doug Hellmann

gc — Garbage Collector — PyMOTW 3

gc exposes the underlying memory management mechanism of Python, the automatic garbage collector. The module includes functions for controlling how the collector operates and to examine the objects known to the system, either pending collection or stuck in reference cycles and unable to be freed. Read more… This post is part of the Python Module …

September 18, 2017 01:00 PM