skip to navigation
skip to content

Planet Python

Last update: February 26, 2020 07:48 AM UTC

February 26, 2020


PyBites

How to Write a Guest Article for PyBites

Hello Everybody! In this article I'll run through the procedure of using git and github to submit a guest article to PyBites.

1. Forking the Repository

In order to get started, you have to create a fork of the original repository. A fork is a copy of a project to your own Github account.

This way, you can work on your own copy and ask the project maintainer to accept some modifications on your copy into the main repository.

Head over to the PyBites blog Github repo and look for the Fork icon in the top right, I've marked it in green:

Github Fork

Git will start creating the copy, and when it's done, you'll have your very own pybites.github.io-src fork!

Github Forked

2. Cloning your copy

Now that we have our running fork, we can clone this from our own Github account.

A clone is a local copy of a project from your Github to your local machine, where you can start editing!

Git Clone

3. Preparing for the future

While you are making your edits or adding your article, the maintainers, or other contributors may be doing the same! Git already knows of our origin, it knows where we cloned from.

We also need to tell it what our upstream is, where did we fork from? Which is where the maintainers will be adding updates.

In this case that's https://github.com/pybites/pybites.github.io-src

Git Remote

So now, if we want to get the maintainers latest modifications and add them to our own repository, we can first pull the upstream master branch, and then push it to our origin master branch!

Git pull and push

4. Before making modifications

The Gittiquette says that you should make a branch before starting any modifications.

A branch is a copy of our local copy at a certain point in time that will go on to live it's on life, and eventually (hopefully) rejoin with it's master somewhere down the road.

You can either create a branch while checking it out, or you can create a branch, and later switch to it with checkout.

Git branch

You might want to set your branch name to something that you can recognize. Some trees have a lot of branches!

I'm currently writing this "how to add guest articles", so I'll give mine an obvious name:

My branch name

Branching is mainly important when you want to verify that your new modifications are not conflicting with any new changes in the upstream.

5. Writing your article

Post directory and style

Guest articles are stored in content, are written in markdown and usually start with guest-

Guest posts

I've added a template you can use here

It contains the structure of most articles and some basic markdown syntax as example, [here]'s another cheat sheet for if you want to do even more!

You can also look at the other articles as an example.

Adding images

If you have images you want to add (screenshots, infographics,...), you can add them under content/images/ and refer to them in your article like this:

![Image Alt Text](images/your-image.png)

A new guest!

If this is your first article, you should also add a small bio in content/pages/guests.md, right above the comment:

<!-- insert your new bio above this line -->

You can copy one from the guests file yourself, or use the template for the bio.

6. Adding and commiting your changes

Commit's are collections of small changes you have made. Adding your article could probably go in a single commit. You're going to tell git that you've changed or added a bunch of files first:

git add

...and then you're gonna store these changes in a commit:

Commit Some

I still have some work to do, but when you're ready, you can push your modified branch back to the branch on it's origin you've created for the modifications:

Push your changes

And you're all set!

If you're logged in and you go to the upstream github page, you'll notice github automatically detects you've pushed a branch that was related to this project!

Upstream Knows

And you can create a pull request, which effectively asks the maintainer to pull the changes you've made into the master branch.

Pull request

Now all you have to do is wait for someone to review your changes and if they are approved, your article will soon be ours to read on the PyBites blog!

--

Thanks for reading, I hope you enjoyed it as much as I enjoyed writing it. If you have any remarks or questions, you can likely find me on the Pybites Slack Channel as 'Jarvis'.

Keep calm and code in Python!

-- Cedric

February 26, 2020 07:21 AM UTC


The No Title® Tech Blog

Book review - Machine Learning with Python for Everyone, By Mark E. Fenner

Machine learning, one of the hottest tech topics of today, is being used more and more. Sometimes as the best tool for the job, other times perhaps as a buzzword that is mainly used as a way to make a product look cooler. However, without knowing what ML is and how it works behind the scenes, it’s very easy to get lost. But this book does a great job in guiding you all the way up from very simple math concepts to some sophisticated machine learning techniques.

February 26, 2020 02:10 AM UTC

February 25, 2020


Roberto Alsina

Episodio 24: I like Windows!

Me puse a hacer un window manager de juguete en Python. Lo basé en HackWM. Hack. Hacker. Carlín! Se llama carlin_wm y realmente no sirve para nada, excepto como experiencia educativa. O sea, es re útil... para mí!

February 25, 2020 11:02 PM UTC


Codementor

Build Systems with Speed and Confidence by Closing the Loop First!

A completely finished “loop” is when you can provide the required input to your system, and it produces the desired output (or side effects, if that’s how you like it). The “Close the loop first” technique is about closing this loop as fast as possible by creating a barebones version of it first, providing all or some required inputs, and generating a partial form of the desired output. Once we have closed this barebones loop, we can then begin implementing behaviours from the inside out, so that with each new change our loop starts looking more like the actual system we want.

February 25, 2020 08:48 PM UTC


Python Insider

Python 3.8.2 and 3.9.0a4 are now available

On behalf of the entire Python development community, and the currently serving Python release team in particular, I’m pleased to announce the release of two of the latest Python editions.

Python 3.8.2

Python 3.8.2 is the second maintenance release of Python 3.8 and contains two months worth of bug fixes. Detailed information about all changes made in 3.8.2 can be found in its change log. Note that compared to 3.8.1, version 3.8.2 also contains the changes introduced in 3.8.2rc1 and 3.8.2rc2.

The Python 3.8 series is the newest feature release of the Python language, and it contains many new features and optimizations. You can find Python 3.8.2 here:
https://www.python.org/downloads/release/python-382/

See the What’s New in Python 3.8document for more information about features included in the 3.8 series.

Maintenance releases for the 3.8 series will continue at regular bi-monthly intervals, with 3.8.3 planned for April 2020 (at the PyCon US sprints).

Python 3.9.0a4

An early developer preview of Python 3.9 is also ready:
https://www.python.org/downloads/release/python-390a4/

Python 3.9 is still in development. This releasee, 3.9.0a4 is the fourth of six planned alpha releases. Alpha releases are intended to make it easier to test the current state of new features and bug fixes and to test the release process. During the alpha phase, features may be added up until the start of the beta phase (2020-05-18) and, if necessary, may be modified or deleted up until the release candidate phase (2020-08-10). Please keep in mind that this is a preview release and its use is not recommended for production environments.

We hope you enjoy both!

Thanks to all of the many volunteers who help make Python Development and these releases possible! Please consider supporting our efforts by volunteering yourself or through organization contributions to the Python Software Foundation.
https://www.python.org/psf/

Your friendly release team,

Ned Deily
Steve Dower
Łukasz Langa

February 25, 2020 08:02 PM UTC


PyCoder’s Weekly

Issue #409 (Feb. 25, 2020)

#409 – FEBRUARY 25, 2020
View in Browser »

The PyCoder’s Weekly Logo


Analysing NBA Assists: How to Visualize Hidden Relationships in Data With Python

Using basketball as the background setting, the author discusses several different strategies for uncovering relationships and producing beautiful visualizations with Python.
JP HWANG

PyCon US 2020 Packaging Summit: Registration and Topic Proposal

Registration is open for the PyCon US 2020 Packaging Summit. Topic proposals are also being accepted. Both registration and topic proposals close on March 7, 2020.
PYTHON.ORG

Python Developers Are in Demand on Vettery

alt

Vettery is an online hiring marketplace that’s changing the way people hire and get hired. Ready for a bold career move? Make a free profile, name your salary, and connect with hiring managers from top employers today →
VETTERY sponsor

Django Security Vulnerability: CVE-2020-7471

Django 1.11 before 1.11.28, 2.2 before 2.2.10, and 3.0 before 3.0.3 allows SQL Injection if untrusted data is used as a StringAgg delimiter (e.g., in Django applications that offer downloads of data as a series of rows with a user-specified column delimiter).
MITRE.ORG

Working With PDFs in Python

In this step-by-step course, you’ll learn how to work with a PDF in Python. You’ll see how to extract metadata from preexisting PDF files. You’ll also learn how to merge, split, watermark, and rotate pages in PDFs using Python and the PyPDF2 library.
REAL PYTHON video

Python in Production

Hynek Schlawack feels that discussions of Python web applications in production are missing from Python conferences. He is offering to mentor people who are interested in proposing conference talks on the subject
HYNEK SCHLAWACK

Null in Python: Understanding Python’s NoneType Object

Learn about the NoneType object None, which acts as the “null” in Python. This object represents emptiness, and you can use it to mark default parameters and even show when you have no result.
REAL PYTHON

PEP 584 PR Merged (Dictionary Union)

This will add the following dictionary operations: dict1 | dict2 (copy + update) and dict1 |= dict2 (update). See PEP 584 for example use cases.
GITHUB.COM/PYTHON

Discussions

Scene From Werner Herzog’s “Programming in Python” ;-)

“I see the lie in front of me – import time, and I am appalled – how can a machine offer such a promise, such a lie, the ability to import time as if it were a simple commodity. Once again, the vile snake has bitten me.”
TWITTER.COM/MVATTUONE

Python Jobs

Senior Python/Django Software Engineer (London, UK)

Zego

Python Developer (Malta)

Gaming Innovation Group

Senior Python Software Engineer (London, UK)

Tessian

Senior Software Engineer Backend (Denver, CO)

CyberGRX

Senior Python Software Developer (Vancouver, BC, Canada)

AbCellera

More Python Jobs >>>

Articles & Tutorials

Pycel: Compiling Excel Spreadsheets to Python and Making Pretty Pictures [2011]

Author describes how he compiled Excel spreadsheets with formulas into Python code in order to optimize the calculations and visualize results. Very interesting read!
DIRK GORISSEN

Better Python Tracebacks With Rich

“I’ve never found Python tracebacks to be a great debugging aid beyond telling me what the exception was, and where it occurred. In a recent update to Rich, I’ve tried to refresh the humble traceback to give enough context to diagnose errors before switching back to the editor.”
WILL MCGUGAN

Monitor Python Application Metrics and Distribute Traces in Real Time With Datadog APM

alt

Datadog’s APM generates detailed flame graphs that will help your teams identify bottlenecks and latency. If an error is spotted, you can easily pivot to related logs and metrics in seconds to troubleshoot without switching tools or contexts. Visualize Python metrics end-to-end with a free trial →
DATADOG sponsor

Introduction to Python SQL Libraries

Learn how to connect to different database management systems by using various Python SQL libraries. You’ll interact with SQLite, MySQL, and PostgreSQL databases and perform common database queries using a Python application.
REAL PYTHON

A Brief Network Analysis of Symbolism in Blake’s Poetry

The author explains how she used the spaCy and NetworkX libraries to analyze William Blake’s 18th century poetry collection The Songs of Innocence and of Excellence.
MARTA PALANDRI

Python Packaging Metadata

“Since this topic keeps coming up, I’d like to briefly share my thoughts on Python package metadata because it’s – as always – more complex than it seems.”
HYNEK SCHLAWACK

How Python Became the Popular Choice

“With the popularity of Python with programmers still growing, we tried to understand how it became one of the most impactful languages in the world.”
JUN WU

How to Add a robots.txt to Your Django Site

robots.txt is a standard file to communicate to “robot” crawlers, such as Google’s Googlebot, which pages they should not crawl.
ADAM JOHNSON

Automate Your Dating Life With 100 Lines of Python

Author used a Python-based man-in-the-middle proxy to deconstruct network calls made by the Hinge app and then built a service to automatically “swipe right” on dating profiles.
ELI MERNIT

Learn Python for Data Science in 4 Weeks

Learn the foundational Python programming and statistics skills needed for a job in data science in as little as 4 weeks. Work 1:1 with a data science mentor to master the skills needed to get started in your journey to a data science role. Enroll in Springboard’s data science career track prep course today.
SPRINGBOARD sponsor

Managing Kindle Highlights With Python and GitHub

Author writes a Python script to build a GitHub repo for storing Kindle book highlights in an organized way.
DUARTE O.CARMO

How to Cheat at Unit Tests With Pytest and Black

Some tips for quickly writing rough initial implementations for test cases and then iterating on them.
SIMON WILLISON

Introduction to Image Processing in Python With OpenCV

MUHAMMAD JUNAID KHALID

Projects & Code

PayloadsAllTheThings: List of Useful Pentesting/CTF Payloads

GITHUB.COM/SWISSKYREPO

Carnets: Standalone Jupyter Notebooks Implementation for iOS

HOLZSCHU.GITHUB.IO

Pycel: Compile Excel Spreadsheets to Python Code & Visualize Them

GITHUB.COM/DGORISSEN

darknet.py: TOR Proxy Written in Python

GITHUB.COM/MULTIVERSECODER

dg: A Python With a Haskell Syntax

PYOS.GITHUB.IO

HiPlot: High-Dimensional Interactive Plots Made Easy

FACEBOOK.COM

Django Security: PyCharm Python Security Plugin

PYCHARM-SECURITY.READTHEDOCS.IO

DeepSpeed: Deep Learning Optimization Library

GITHUB.COM/MICROSOFT

Events

JupyterCon 2020

August 10–14 in Berlin, Germany.
NUMFOCUS.ORG

Python Vienna Meetup

February 29, 2020
MEETUP.COM

Python Mauritius User Group Meetup

February 29, 2020
MEETUP.COM

PyDelhi User Group Meetup

February 29, 2020
MEETUP.COM

Melbourne Python Users Group, Australia

March 2, 2020
J.MP


Happy Pythoning!
This was PyCoder’s Weekly Issue #409.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

February 25, 2020 07:30 PM UTC


Data School

How to merge DataFrames in pandas (video)

How to merge DataFrames in pandas (video)

In my new pandas video, you're going to learn how to use the "merge" function so that you can combine multiple datasets into a single DataFrame.

Merging (also known as "joining") can be tricky to do correctly, which is why I'll walk you through the process in great detail. By the end of the video, you'll be fully prepared to merge your own DataFrames!

"This, by far, is the best explanation of these concepts." - M. Schuer

Click on a timestamp below to jump to a particular section:

1:21 Selecting a function (merge/join/concat/append)
3:36 Details of the merge process
12:07 Handling common merge issues
17:01 Comparing the four types of joins (inner/outer/left/right)

If you want to follow along with the code, you can download the Jupyter notebook and the datasets from GitHub.

Related Resources

If you have any questions, please let me know in the comments below!

February 25, 2020 04:56 PM UTC


Real Python

How to Work With a PDF in Python

The Portable Document Format or PDF is a file format that can be used to present and exchange documents reliably across operating systems. While the PDF was originally invented by Adobe, it is now an open standard that is maintained by the International Organization for Standardization (ISO). You can work with a preexisting PDF in Python by using the PyPDF2 package.

PyPDF2 is a pure-Python package that you can use for many different types of PDF operations.

By the end of this course, you’ll know how to:


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

February 25, 2020 02:00 PM UTC


Programiz

Python Programming

February 25, 2020 08:54 AM UTC


Python Bytes

#170 Visualize this: Visualizing Python's visualization ecosystem

February 25, 2020 08:00 AM UTC


Podcast.__init__

Reducing The Friction Of Embedded Software Development With PlatformIO

Embedded software development is a challenging endeavor due to a fragmented ecosystem of tools. Ivan Kravets experienced the pain of programming for different hardware platforms when embroiled in a home automation project. As a result he built the PlatformIO ecosystem to reduce the friction encountered by engineers working with multiple microcontroller architectures. In this episode he describes the complexities associated with targeting multiple platforms, the tools that PlatformIO offers to simplify the workflow, and how it fits into the development process. If you are feeling the pain of working with different editing environments and build toolchains for various microcontroller vendors then give this interview a listen and then try it out for yourself.

Summary

Embedded software development is a challenging endeavor due to a fragmented ecosystem of tools. Ivan Kravets experienced the pain of programming for different hardware platforms when embroiled in a home automation project. As a result he built the PlatformIO ecosystem to reduce the friction encountered by engineers working with multiple microcontroller architectures. In this episode he describes the complexities associated with targeting multiple platforms, the tools that PlatformIO offers to simplify the workflow, and how it fits into the development process. If you are feeling the pain of working with different editing environments and build toolchains for various microcontroller vendors then give this interview a listen and then try it out for yourself.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, node balancers, a 40 Gbit/s public network, and a brand new managed Kubernetes platform, all controlled by a convenient API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they’ve got dedicated CPU and GPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Strata Data in San Jose, and PyCon US in Pittsburgh. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
  • Your host as usual is Tobias Macey and today I’m interviewing Ivan Kravets about PlatformIO, an open source ecosystem for IoT development including a cross-platform IDE, unified debugger, remote unit testing, and firmware updates.

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what PlatformIO is?
    • What was your motivation for creating it?
    • What are the aspects of embedded development that keep you interested and engaged in this space?
  • What are some of the types of projects that someone might use PlatformIO to build?
  • What are some of the common challenges that a developer might encounter when working on embedded systems?
    • What are the additional complexities that get introduced as more hardware targets get added to a project?
  • What is the workflow for someone using PlatformIO for embedded systems development?
  • What are the different elements of PlatformIO and how do they simplify the work of building embedded systems projects?
  • How is PlatformIO implemented and how has the system design evolved since you first began working on it?
    • What was your reason for selecting Python as the implementation language?
    • If you were to start over today what would you do differently?
  • How has the embedded hardware and software landscape changed since you first started work on PlatformIO?
    • How has that impacted your product direction?
  • How do developers handle testing and validation of their applications?
  • How does PlatformIO help with updating deployed devices with new firmware?
  • What have been some of the most interesting/unexpected/innovative projects that you have seen built with PlatformIO?
  • What have been some of the most interesting/unexpected/challenging aspects of building and maintaining PlatformIO?
  • How are you approaching sustainability of the project and business?
  • What do you have planned for the future of PlatformIO?

Keep In Touch

Picks

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

February 25, 2020 03:25 AM UTC


Catalin George Festila

Python 3.7.6 : The new concepts of execution in python 3 - part 001.

The main goal of these tutorials series is learning to deal with python source code using the new concepts of execution in python 3. When two or more events are concurrent it means that they are happening at the same time. Concurrent programming is not equivalent to parallel execution. In computing, concurrency is the execution of pieces of work or tasks by a computer at the same time.

February 25, 2020 02:55 AM UTC


Talk Python to Me

#253 Moon base geekout

This episode is a unique one. On this episode, I've invited Richard Campbell and developer and podcaster who also dives deep into science and tech topics. We are going to dig into his geekout series and spend some time talking realistically about moonbases and space travel.

February 25, 2020 12:00 AM UTC

February 24, 2020


PyBites

Productivity Mondays - Are You Producing Enough Value? 5 Tips to Boost Your Deep Work

Here is another edition of Productivity Mondays geared towards getting you closer towards your goals. This weekend I picked up Deep work again. Every time I read it is a revelation. The better you manage your time, the more successful you will become. It all comes down to the amount of value you can produce. And for that deep work is essential.

"If you don’t produce, you won’t thrive—no matter how skilled or talented you are." (Cal Newport - Deep Work)

Here are 5 tips to get more deep work done. Don't just read this, please comment below which tips you use or are going to use on a daily basis from now on. Or share ones of your own. Here we go:

  1. Plan out your week during the weekend, and your day the night before.

    Schedule large blocks of uninterrupted time on the tasks (80/20) that move you closer towards your goals. If you don't do this, it's so easy to let social media, unimportant meetings, and other interruptions take over your schedule.

    You are an ASSET, protect your time!

  2. Persistence. When you work on an important task sit with it till it's complete.

    This will not only increase your output / value, this will seriously boost your level of confidence.

  3. Environment is half of the battle.

    We find it's best to schedule your deep work early in the morning. Part of the world (kids!) is still asleep and as the day progresses, the amount of interrupts increases.

    It's also important to realize that willpower is a finite resource, you have way more of it earlier in the day!

    Ideally you do your one or two most important tasks (MITs) first. It's enormously empowering to cross those off of your list. It's also THE way to beat imposter syndrome because these tasks are often related to getting onto the court, playing the game!

    The other thing to pay close attention to is your physiology. For example:

    • Are you drinking enough water? No? Have a water bottle on your desk at all times.

    • Eating healthy food? No? Whatever is in your kitchen gets consumed, don't buy groceries when hungry.

    • Are you getting enough exercise? No? Get some steps in when you wake up and set a daily alarm to go to the gym (the correlation between exercise / fitness and overall performance is just too high to ignore this!)

    All these things make it easier to get the work done that will ultimately matter.

  4. Motivation. Ask yourself why:

    • Why am I doing this?
    • Who am I serving and why?
    • Why is this important to me?

    These are great questions to get back on track. Specially when you struggle with procrastination or you find yourself aimlessly clicking on Facebook.

    When that happens stop and grab a notebook. Go back to your WHY, maybe something is off in the stories you tell yourself (identify).

    Honestly I was already lining up two resources, Eat that frog! and Maker's Schedule, Manager's Schedule, which are great reads, but sometimes the best tool is a notebook or your journal. Remember, it all starts in your MIND. Thoughts -> actions -> results.

    "Who you are, what you think, feel, and do, what you love—is the sum of what you focus on." (Cal Newport - Deep Work)

  5. Review your weeks / months. Here is the template we use every weekend. Why is this important? A plethora of reasons but mainly because:

    • It keeps you focused on your goals and next actions you need to take
    • It lets you reflect on the week (be honest with yourself) and course correct, which prevents you from making the same mistakes again.

Now go crush it this week and share in the comments below how this week is going so far ...

-- Bob

With so many avenues to pursue in Python it can be tough to know what to do. If you're looking for some direction or want to take your Python code and career to the next level, book a strategy session with us. We can help you!

February 24, 2020 08:40 PM UTC

Talking to API's and goodlooking tools

One of my go-to locations for security news had a thread recently about a tool called VTScan. I really liked the idea of not having to go through the browser overhead to check files against multiple scan engines.

Although the tool (which is itself a basic vt-cli spinoff) already existed, I was looking for a new challenge, I decided to roll my own and add a few cool features! I'll have a thorough look at how python talks to API's with requests and I look at turning all this API data into a nice GUI application with click. I hope to give you some idea's for CLI styling in the future so I can see more awesome tools by you all!

You can find the full code on my github.

Index

Requirements

REST

What is REST?

According to Wikipedia:

Representational state transfer (REST) is a software architectural style that defines a set of constraints to be used for creating Web services. Web services that conform to the REST architectural style, called RESTful Web services, provide interoperability between computer systems on the Internet. RESTful Web services allow the requesting systems to access and manipulate textual representations of Web resources by using a uniform and predefined set of stateless operations. Other kinds of Web services, such as SOAP Web services, expose their own arbitrary sets of operations.

So in human language, a REST API is just a web-based endpoint that we can send HTTP requests to. This endpoint in turn will query an application on the backend and will return some data based on what the application does.

In our example, we will post a file to a webserver, and the webserver will send the file to a number of anti-virus scanners. The results of all these scans will be put in a report to indicate if a file has been flagged as a virus or not.

Note: API Key Protection

Before we get into the action, I want to leave a small note about protecting your API keys. Your API key is a unique identifier and authorization mechanism to allow you to access certain services (like REST API's) with just a single key.

If anyone manages to get a hold of this unique string, people WILL be able to query the service as if they are you.

Something very common is that developers accidentally push their code including API keys to github, and thus everyone can access the service as that developer.

Bob mentioned a common way to tackle this problem in his mentoring session digest where he uses os.getenv.

I personally tend to create a sensitive.py file, which I then add to my .gitignore.

This allows me to import my sensitive data like: from sensitive import APIKEY.

(Of course, the API key is still stored in a file, so Bob's way of using os.getenv is waaay more foolproof!)

Either way, hide those keys!

The Setup

  1. Get a free API key at VirusTotal by creating an account and getting the API-key from your profile:

    API Menu

  2. Install the required packages:

    pip3 install pywin32 click requests win10toast

  3. (Optional) Install the colorama package if you would like colors!

    pip3 install colorama

This package is used by click to draw terminal colors, but click can run perfectly without it.

The VirusTotal API

The first thing you should do when implementing an API that you don't know, is to Read the Manual!

A lot of times, sample code is provided to get you started, and usually, API documentation lists all endpoints you can query to get information. Besides, your tool acts like a view for the API, so you need to know what you have to send, what you can expect, and what is required to make requests.

RTFM!

So go ahead and have a quick look the 2 links below and just read through them.
I'll be focusing on /file/scan and on /file/report for the purpose of this article!

The VT API: /file/scan

The endpoint is described as follows:

This endpoint allows you to send a file for scanning with VirusTotal. Before performing your submissions we encourage you to retrieve the latest report on the file, if it is recent enough you might want to save time and bandwidth by making use of it. File size limit is 32MB, in order to submit files up to 200MB in size you must request a special upload URL using the /file/scan/upload_url endpoint.

The python example looks like this:

import requests
url = 'https://www.virustotal.com/vtapi/v2/file/scan'
params = {'apikey': '<apikey>'}
files = {'file': ('myfile.exe', open('myfile.exe', 'rb'))}
response = requests.post(url, files=files, params=params)
print(response.json())

What is going on?

Let's run this and have a look at the data that's being returned:

{
    "scan_id":"8fbc375f08b4cb9b55c64f14b32891f9703ab3e69ca13f504deec7655fcd13b6-1582211127"
    "sha1":"552d86c190fb6ad0f4734f44e59dce91fc364230"
    "resource":"8fbc375f08b4cb9b55c64f14b32891f9703ab3e69ca13f504deec7655fcd13b6"
    "response_code":1
    "sha256":"8fbc375f08b4cb9b55c64f14b32891f9703ab3e69ca13f504deec7655fcd13b6"
    "permalink":"https://www.virustotal.com/file/8fbc375f08b4cb9b55c64f14b32891f9703ab3e69ca13f504deec7655fcd13b6/ana ..."
    "md5":"f9615c7e8528ed16b213a796af2ef31b"
    "verbose_msg":"Scan request successfully queued, come back later for the report"
}

Looking good, although we don't have the results we're looking for yet, in terms of positives per scanner.

The VT API: /file/report

This endpoint is described as follows:

The resource argument can be the MD5, SHA-1 or SHA-256 of a file for which you want to retrieve the most recent antivirus report. You may also specify a scan_id returned by the /file/scan endpoint.

Again, the python code is straightforward:

import requests
url = 'https://www.virustotal.com/vtapi/v2/file/report'
params = {'apikey': '<apikey>', 'resource': '<resource>'}
response = requests.get(url, params=params)
print(response.json())

What is going on?

VT API: Chaining the endpoints together

What we ultimately want to reach, is that we can run the script, pass a file to it, and it uploads and scans the file and prints the result without our intervention. By now we know a few things:

Here's the function I wrote to contact the /file/scan endpoint:

def scan_single_file(file):
    url = 'https://www.virustotal.com/vtapi/v2/file/scan'
    with open(file, "rb") as _f:
        with requests.Session() as _sess:
            response = _sess.post(url, files={'file': _f}, params=params)
    json_resp = response.json()
    resource = json_resp['resource'] # Extract the "resource" value from the JSON data
    _print_prefixed_message('*', 'yellow', f'Getting Scan Result for {file}')
    generate_scan_report(resource) # Call function to generate scan report based on the resource

The first thing you might notice is that I do not have a declaration for params in this function.

That is because params has to be sent to the endpoint every time. So if we want to reuse it in every function and it never changes, we can easily set a global variable at the top of our code that will act as a constant.

A lot of people dislike working with global variables because it might not always be clear where the values are coming from.

from sensitive import APIKEY
params = {'apikey': APIKEY}

def ...

Because we are writing a single, non-object-oriented script, this is perfectly fine, and it should still be clear where this variable is coming from.

Apart from that, we're pretty much doing the same as the example script, but we're making sure our files and sessions get closed properly by using the with .. as .. :-format. Don't worry too much about _print_prefixed_message(), we'll get to that later when we discuss the magic that Click is!

If you don't understand json_resp['resource'], remember that Json is just another dict in python, and dicts have keys!

In [1]: example_json = { 'id': 1, 'name': 'Jarvis' }

In [2]: type(example_json)
Out[2]: dict

In [3]: example_json.keys()
Out[3]: dict_keys(['id', 'name'])

In [4]: example_json['name']
Out[4]: 'Jarvis'

So now we have a function to upload the file and get the resource-id. Next we'll need a function to get the scan report based on this resource-id!

def generate_scan_report(resource_id):
    url = 'https://www.virustotal.com/vtapi/v2/file/report'
    local_params = {'resource': resource_id}

    full_params = dict()
    full_params.update(params)
    full_params.update(local_params)

    with requests.Session() as _sess:
        response = _sess.get(url, params=full_params)

    json_resp = response.json()
    for key in json_resp.keys():
        if key == "scans":
            vendor_table = json_resp[key]
        elif key == "verbose_msg":
            result_message = json_resp[key]
        elif key == "total":
            total_scans = json_resp[key]
        elif key == "positives":
            total_positives = json_resp[key]
        elif key == "permalink":
            permalink = json_resp[key]
        elif key == "scan_date":
            scan_date = json_resp[key]
    print_scan_report(vendor_table, permalink, scan_date, result_message, total_scans, total_positives)

What is going on?

If we look at the data that's coming back from the report endpoint we see it looks like this:

{'scans': {'Bkav': {'detected': False, 'version': '1.3.0.9899', ...

Awesome that's exactly what we need but in it's current state, the tool is not really usable.

However, in terms of what we need to know from the API, we're all done. Very often, creating a tool that chains API endpoints together can be done with just a few lines of code and the requests library.

You can go out there right now, find an API of your liking, and start getting that data!

When you've done all that, you can come back here and we'll get to making things pretty and usable!

Making pretty CLI Tools with Click

If you've read my previous post, you know I like my data pretty!

So the end goal in this chapter will be to go from this:

Ewwww Ugly

To this:

Woaahhh Pretty

Helpers, they really do help

Earlier, I promised an explanation for that _print_prefixed_message() function. The click library provides 2 basic print functions click.echo and click.secho

There's the option to add style() to click.echo, but secho already does this for us.

From the docs:

The combination of echo() and style() is also available in a single function called secho()

Here's the prototype:

click.secho(message=None, file=None, nl=True, err=False, color=None, **styles)

Do you see those little [:)]'s and [!!]'s in their respective colors?

In the beginning they were all individually printed, so I had multiple calls actually doing the same. When you're duplicating code, there's probably a way to throw it in a function:

def _print_prefix(character, color):
    click.secho('[', nl=False)
    click.secho(character, fg=color, nl=False)
    click.secho('] ', nl=False)

def _print_prefixed_message(character, color, message):
    _print_prefix(character, color)
    click.secho(message)

What's going on?

Instead of having to write 4 click.secho's every time I want to print a message, I can simply call:

_print_prefixed_message('*', 'yellow', f'Yellow prefixed message for you!')

If you have the feeling you're duplicating code and just making minor changes, take the time to look at what you're doing and sometimes, you can make a helper to help you!

Here's an example use of _print_prefix() in case you're rightfully wondering why I broke those up.

def print_vendor_table(vendordict):
    for vd in vendordict.keys():
        if vendordict[vd]['detected']:
            _print_prefix('!!', 'red')
            click.secho(vd, fg='red')
        else:
            _print_prefix(':)', 'green')
            click.secho(vd, fg='green')

And even here we have some duplicate pieces that we could optimize. Everything in these helpers could probably also have been done with decorators. But the code is functional, readable and works, so that's enough for now (feel free to submit a PR if you like!)

Options and flags

Again, what we want to achieve, is a tool where we can simply go:

python tool.py -f /file/to/scan.exe

To make this type of behavior easier to implement, click offers a couple of decorators, here's the head of my main() function to give you an idea:

@click.command()
@click.option("-w", "--watcher", default=False, is_flag=True)
@click.option("-D", "--directory", type=str, default=None)
@click.option("-f", "--file", type=str, default=None)
def main(**kwargs):
    ...

What's going on? - First we're saying that the following function is our command, click automatically adds a --help to commands.

So if I now run:

python vtscan.py --file ./myfile.exe

This is where I'll close up around click, if you want to know more about this awesome library be sure to read the docs!

More functionality: Adding the Watcher

This part covers how I added a directory watcher and some of the challenges I faced.

Aside from the normal vt-cli behavior, I wanted to also be able to drop files in a directory and have them scanned automatically.

We'll have a look at a script I found to do the actual directory matching, and we'll look at the module used: pywin32.

I won't go too much in depth on the Win32 API because that's a whole different writeup.

Additional work on the arguments

When the program is running interactively with -f and it receives the file, it can't start it's watcher loop. If the program is running as a watcher, the --directory has to be specified and we shouldn't prompt the user for more details to prevent interruption.

I also wanted to add my own messages so I could use the same format on my errors as I did for the rest of the scan reports, like this:

Pretty Errors

def parse_cli_options(**kwargs):
    global is_watcher

    watcher_opt = kwargs['watcher']
    dir_opt = kwargs['directory']
    file_opt = kwargs['file']

    if watcher_opt:
        _print_prefixed_message("*", "cyan", "Running as watcher!")
        is_watcher = True

        if dir_opt is None:
            _print_prefixed_message("E", "red", "You need to specify a directory to watch when running as Watcher!")
            _print_prefixed_message("i", "cyan", "Run VTScan.py --help for more info")
            exit()

    elif file_opt is None:
        _print_prefixed_message("E", "red", "You must specify a file when running interactively")
        _print_prefixed_message("i", "cyan", "Run VTScan.py --help for more info")
        exit()

    return watcher_opt, dir_opt, file_opt

@click.command()
@click.option("-w", "--watcher", default=False, is_flag=True)
@click.option("-D", "--directory", type=str, default=None)
@click.option("-f", "--file", type=str, default=None)
def main(**kwargs):
    watcher_opt, dir_opt, file_opt = parse_cli_options(**kwargs)
    if watcher_opt:
        run_as_watcher(dir_opt)
    else:
        scan_single_file(file_opt)

What's going on?

Here's an example use of that global variable to toggle some functionality:

    if not is_watcher:
        click.secho('Show Detail? [y/n]: ', nl=False)
        c = click.getchar()
        click.echo()
        if c.upper() == 'Y':
            print_vendor_table(vendordict)
        if c.upper() == 'N':
            click.secho("Exiting!")
        exit()

If we're running as a watcher, we don't need to ask the user to show details. Very convenient!

Watching a directory: Win32

Credit where credit is due, for this part, I only slightly modified the code I found here.

It elegantly uses pywin32 to access the Win32 API and loops ReadDirectoryChangesW to check a directory for changes. Perfect for our directory watcher!

def run_as_watcher(directory):
    path_to_watch = directory

    hDir = win32file.CreateFile(
        path_to_watch,
        FILE_LIST_DIRECTORY,
        win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE | win32con.FILE_SHARE_DELETE,
        None,
        win32con.OPEN_EXISTING,
        win32con.FILE_FLAG_BACKUP_SEMANTICS,
        None
    )
    try:
        while 1:
                results = win32file.ReadDirectoryChangesW (
                    hDir,
                    1024,
                    True,
                    win32con.FILE_NOTIFY_CHANGE_FILE_NAME |
                    win32con.FILE_NOTIFY_CHANGE_DIR_NAME |
                    win32con.FILE_NOTIFY_CHANGE_ATTRIBUTES |
                    win32con.FILE_NOTIFY_CHANGE_SIZE |
                    win32con.FILE_NOTIFY_CHANGE_LAST_WRITE |
                    win32con.FILE_NOTIFY_CHANGE_SECURITY,
                    None,
                    None
                )
                for action, file in results:
                    full_filename = os.path.join(path_to_watch, file)
                    if ACTION.get(action, "Unknown") == "Created":
                        time.sleep(1)
                        scan_single_file(full_filename)

    except KeyboardInterrupt:
        print("Exiting!")
        exit()

What's going on?

One way to replace sleep would be to wait for the file to no longer be in use, which is something for the future. Right now, the program will fail with an "Access Denied" if you paste a large file that takes longer than 1 second.

And that's all there's to it!

Watcher CLI

Adding Toast messages: Win10Toast

A final library I added so I wouldn't have to go back to the CLI log everytime, was win10toast.

Again, it's awesome how little code is needed to create a pretty toast message!

# Show results in Toast!
        toaster = ToastNotifier()
        toaster.show_toast("Scan Complete!", f"Positives: {positives} / {total}", duration=5, icon_path=".\\favicon.ico")

And we have a fully operational tool for our day to day job!

Toast in Action GIF

--

Thanks for reading, I hope you enjoyed it as much as I enjoyed writing it. If you have any remarks or questions, you can likely find me on the Pybites Slack Channel as 'Jarvis'.

Keep calm and code in Python!

-- Cedric

February 24, 2020 08:39 PM UTC


Stack Abuse

Introduction to Image Processing in Python with OpenCV

Introduction

In this tutorial, we are going to learn how we can perform image processing using the Python language. We are not going to restrict ourselves to a single library or framework; however, there is one that we will be using the most frequently, the Open CV library. We will start off by talking a little about image processing and then we will move on to see different applications/scenarios where image processing can come in handy. So, let's begin!

What is Image Processing?

It is important to know what exactly image processing is and what is its role in the bigger picture before diving into its how's. Image Processing is most commonly termed as 'Digital Image Processing' and the domain in which it is frequently used is 'Computer Vision'. Don't be confused - we are going to talk about both of these terms and how they connect. Both Image Processing algorithms and Computer Vision (CV) algorithms take an image as input; however, in image processing, the output is also an image, whereas in computer vision the output can be some features/information about the image.

Why do we need it?

The data that we collect or generate is mostly raw data, i.e. it is not fit to be used in applications directly due to a number of possible reasons. Therefore, we need to analyze it first, perform the necessary pre-processing, and then use it.

For instance, let's assume that we were trying to build a cat classifier. Our program would take an image as input and then tell us whether the image contains a cat or not. The first step for building this classifier would be to collect hundreds of cat pictures. One common issue is that all the pictures we have scraped would not be of the same size/dimensions, so before feeding them to the model for training, we would need to resize/pre-process them all to a standard size.

This is just one of many reasons why image processing is essential to any computer vision application.

Prerequisites

Before going any further, let's discuss what you need to know in order to follow this tutorial with ease. Firstly, you should have some basic programming knowledge in any language. Secondly, you should know what machine learning is and the basics of how it works, as we will be using some machine learning algorithms for image processing in this article. As a bonus, it would help if you have had any exposure to, or basic knowledge of, Open CV before going on with this tutorial. But this is not required.

One thing you should definitely know in order to follow this tutorial is how exactly an image is represented in memory. Each image is represented by a set of pixels i.e. a matrix of pixel values. For a grayscale image, the pixel values range from 0 to 255 and they represent the intensity of that pixel. For instance, if you have an image of 20 x 20 dimensions, it would be represented by a matrix of 20x20 (a total of 400-pixel values).

If you are dealing with a colored image, you should know that it would have three channels - Red, Green, and Blue (RGB). Therefore, there would be three such matrices for a single image.

Installation

Note: Since we are going to use OpenCV via Python, it is an implicit requirement that you already have Python (version 3) already installed on your workstation.

Windows

$ pip install opencv-python

MacOS

$ brew install opencv3 --with-contrib --with-python3

Linux

$ sudo apt-get install libopencv-dev python-opencv

To check if your installation was successful or not, run the following command in either a Python shell or your command prompt:

import cv2

Some Basics You Should Know

Before we move on to using Image Processing in an application, it is important to get an idea of what kind of operations fall into this category, and how to do those operations. These operations, along with others, would be used later on in our applications. So, let's get to it.

For this article we'll be using the following image:

original image used for basic image processing

Note: The image has been scaled for the sake of displaying it in this article, but the original size we are using is about 1180x786.

You probably noticed that the image is currently colored, which means it is represented by three color channels i.e. Red, Green, and Blue. We will be converting the image to grayscale, as well as splitting the image into its individual channels using the code below.

Finding Image Details

After loading the image with the imread() function, we can then retrieve some simple properties about it, like the number of pixels and dimensions:

import cv2

img = cv2.imread('rose.jpg')

print("Image Properties")
print("- Number of Pixels: " + str(img.size))
print("- Shape/Dimensions: " + str(img.shape))

Output:

Image Properties
- Number of Pixels: 2782440
- Shape/Dimensions: (1180, 786, 3)

Splitting an Image into Individual Channels

Now we'll split the image in to its red, green, and blue components using OpenCV and display them:

from google.colab.patches import cv2_imshow

blue, green, red = cv2.split(img) # Split the image into its channels
img_gs = cv2.imread('rose.jpg', cv2.IMREAD_GRAYSCALE) # Convert image to grayscale

cv2_imshow(red) # Display the red channel in the image
cv2_imshow(blue) # Display the red channel in the image
cv2_imshow(green) # Display the red channel in the image
cv2_imshow(img_gs) # Display the grayscale version of image

For brevity, we'll just show the grayscale image.

Grayscale Image:

flower image in greyscale

Image Thresholding

The concept of thresholding is quite simple. As discussed above in the image representation, pixel values can be any value between 0 to 255. Let's say we wish to convert an image into a binary image i.e. assign a pixel either a value of 0 or 1. To do this, we can perform thresholding. For instance, if the Threshold (T) value is 125, then all pixels with values greater than 125 would be assigned a value of 1, and all pixels with values lesser than or equal to that would be assigned a value of 0. Let's do that through code to get a better understanding.

Image used for Thresholding:

image used for thresholding
import cv2

# Read image
img = cv2.imread('image.png', 0)

# Perform binary thresholding on the image with T = 125
r, threshold = cv2.threshold(img, 125, 255, cv2.THRESH_BINARY)
cv2_imshow(threshold)

Output:

image thresholding output

As you can see, in the resultant image, two regions have been established, i.e. the black region (pixel value 0) and white region (pixel value 1). Turns out, the threshold we set was right in the middle of the image, which is why the black and white values are divided there.

Applications

#1: Removing Noise from an Image

Now that you have got a basic idea of what image processing is and what it is used for, let's go ahead and learn about some of its specific applications.

In most cases, the raw data that we gather has noise in it i.e. unwanted features that makes the image hard to perceive. Although these images can be used directly for feature extraction, the accuracy of the algorithm would suffer greatly. This is why image processing is applied to the image before passing it to the algorithm to get better accuracy.

There are many different types of noise, like Gaussian noise, salt and pepper noise, etc. We can remove that noise from an image by applying a filter which removes that noise, or at the very least, minimizes its effect. There are a lot of options when it comes to filters as well, each of them has different strengths, and hence is the best for a specific kind of noise.

To understand this properly, we are going to add 'salt and pepper' noise to the grayscale version of the rose image that we considered above, and then try to remove that noise from our noisy image using different filters and see which one is best-fit for that type.

import numpy as np

# Adding salt & pepper noise to an image
def salt_pepper(prob):
      # Extract image dimensions
      row, col = img_gs.shape

      # Declare salt & pepper noise ratio
      s_vs_p = 0.5
      output = np.copy(img_gs)

      # Apply salt noise on each pixel individually
      num_salt = np.ceil(prob * img_gs.size * s_vs_p)
      coords = [np.random.randint(0, i - 1, int(num_salt))
            for i in img_gs.shape]
      output[coords] = 1

      # Apply pepper noise on each pixel individually
      num_pepper = np.ceil(prob * img_gs.size * (1. - s_vs_p))
      coords = [np.random.randint(0, i - 1, int(num_pepper))
            for i in img_gs.shape]
      output[coords] = 0
      cv2_imshow(output)

      return output

# Call salt & pepper function with probability = 0.5
# on the grayscale image of rose
sp_05 = salt_pepper(0.5)

# Store the resultant image as 'sp_05.jpg'
cv2.imwrite('sp_05.jpg', sp_05)

Alright, we have added noise to our rose image, and this is what it looks like now:

Noisy Image:

image with noise

Lets now apply different filters on it and note down our observations i.e. how well each filter reduces the noise.

Arithmetic Filter with Sharpening Kernel
# Create our sharpening kernel, the sum of all values must equal to one for uniformity
kernel_sharpening = np.array([[-1,-1,-1],
                              [-1, 9,-1],
                              [-1,-1,-1]])

# Applying the sharpening kernel to the grayscale image & displaying it.
print("\n\n--- Effects on S&P Noise Image with Probability 0.5 ---\n\n")

# Applying filter on image with salt & pepper noise
sharpened_img = cv2.filter2D(sp_05, -1, kernel_sharpening)
cv2_imshow(sharpened_img)

The resulting image, from applying arithmetic filter on the image with salt and pepper noise, is shown below. Upon comparison with the original grayscale image, we can see that it brightens the image too much and is unable to highlight the bright spots on the rose as well. Hence, it can be concluded that arithmetic filter fails to remove salt and pepper noise.

Arithmetic Filter Output:

image without noise via arithmetic filter
Midpoint Filter
from scipy.ndimage import maximum_filter, minimum_filter

def midpoint(img):
    maxf = maximum_filter(img, (3, 3))
    minf = minimum_filter(img, (3, 3))
    midpoint = (maxf + minf) / 2
    cv2_imshow(midpoint)

print("\n\n---Effects on S&P Noise Image with Probability 0.5---\n\n")
midpoint(sp_05)

The resulting image, from applying th Midpoint Filter on the image with salt and pepper noise, is shown below. Upon comparison with the original grayscale image, we can see that, like the kernel method above, brightens the image too much; however, it is able to highlight the bright spots on the rose. Therefore, we can say that it is a better choice than the arithmetic filter, but still it does not recover the original image completely.

Midpoint Filter Output:

image without noise via midpoint filter
Contraharmonic Mean Filter

Note: The implementations of these filters can be found online easily and how exactly they work is out of scope for this tutorial. We will be looking at the applications from an abstract/higher level.

def contraharmonic_mean(img, size, Q):
    num = np.power(img, Q + 1)
    denom = np.power(img, Q)
    kernel = np.full(size, 1.0)
    result = cv2.filter2D(num, -1, kernel) / cv2.filter2D(denom, -1, kernel)
    return result

print("\n\n--- Effects on S&P Noise Image with Probability 0.5 ---\n\n")
cv2_imshow(contraharmonic_mean(sp_05, (3,3), 0.5))

The resulting image, from applying Contraharmonic Mean Filter on the image with salt and pepper noise, is shown below. Upon comparison with the original grayscale image, we can see that it has reproduced pretty much the exact same image as the original one. Its intensity/brightness level is the same and it highlights the bright spots on the rose as well. Hence, we can conclude that contraharmonic mean filter is very effective in dealing with salt and pepper noise.

Contraharmonic Mean Filter Output:

image without noise via contraharmonic filter

Now that we have found the best filter to recover the original image from a noisy one, we can move on to our next application.

#2: Edge Detection using Canny Edge Detector

The rose image that we have been using so far has a constant background i.e. black, therefore, we will be using a different image for this application to better show the algorithm's capabilities. The reason is that if the background is constant, it makes the edge detection task rather simple, and we don't want that.

We talked about a cat classifier earlier in this tutorial, let's take that example forward and see how image processing plays an integral role in that.

In a classification algorithm, the image is first scanned for 'objects' i.e. when you input an image, the algorithm would find all the objects in that image and then compare them against the features of the object that you are trying to find. In case of a cat classifier, it would compare all objects found in an image against the features of a cat image, and if a match is found, it tells us that the input image contains a cat.

Since we are using the cat classifier as an example, it is only fair that we use a cat image going forward. Below is the image we will be using:

Image used for Edge Detection:

image used for edge detection

import cv2
import numpy as np
from matplotlib import pyplot as plt

# Declaring the output graph's size
plt.figure(figsize=(16, 16))

# Convert image to grayscale
img_gs = cv2.imread('cat.jpg', cv2.IMREAD_GRAYSCALE)
cv2.imwrite('gs.jpg', img_gs)

# Apply canny edge detector algorithm on the image to find edges
edges = cv2.Canny(img_gs, 100,200)

# Plot the original image against the edges
plt.subplot(121), plt.imshow(img_gs)
plt.title('Original Gray Scale Image')
plt.subplot(122), plt.imshow(edges)
plt.title('Edge Image')

# Display the two images
plt.show()

Edge Detection Output:

edge detection output

As you can see, the part of the image which contains an object, which in this case is a cat, has been dotted/separated through edge detection. Now you must be wondering, what is the Canny Edge Detector and how did it make this happen; so let's discuss that now.

To understand the above, there are three key steps that need to be discussed. First, it performs noise reduction on the image in a similar manner that we discussed previously. Second, it uses the first derivative at each pixel to find edges. The logic behind this is that the point where an edge exists, there is an abrupt intensity change, which causes a spike in the first derivative's value, hence making that pixel an 'edge pixel'.

At the end, it performs hysteresis thresholding; we said above that there's a spike in the value of first derivative at an edge, but we did not state 'how high' the spike needs to be for it to be classified as an edge - this is called a threshold! Earlier in this tutorial we discussed what simple thresholding is. Hysteresis thresholding is an improvement on that, it makes use of two threshold values instead of one. The reason behind that is, if the threshold value is too high, we might miss some actual edges (true negatives) and if the value is too low, we would get a lot of points classified as edges that actually are not edges (false positives). One threshold value is set high, and one is set low. All points which are above the 'high threshold value' are identified as edges, then all points which are above the low threshold value but below the high threshold value are evaluated; the points which are close to, or are neighbors of, points which have been identified as edges, are also identified as edges and the rest are discarded.

These are the underlying concepts/methods that Canny Edge Detector algorithm uses to identify edges in an image.

Conclusion

In this article, we learned how to install OpenCV, the most popular library for image processing in Python, on different platforms like Windows, MacOS, and Linux, as well as how to verify that the installation was successful.

We went on to discuss what Image Processing is and its uses in the computer vision domain of Machine Learning. We talked about some common types of noise and how we can remove it from our images using different filters, before using the images in our applications.

Furthermore, we learned how image processing plays an integral part in high-end applications like Object Detection or classification. Do note that this article was just the tip of the iceberg, and Digital Image Processing has a lot more in the store that cannot possibly be covered in a single tutorial. Reading this should enable you to dive deeper and learn about other advanced concepts related to image processing. Good Luck!

February 24, 2020 05:43 PM UTC


Andre Roberge

From a rejected Pycon talk to a new project.

Like many others, my talk proposal (early draft here) for Pycon US was rejected. So, I decided to spend some time putting everything in a new project instead. (Documentation here.)  It is still a rough draft, but usable ... and since I've mentioned it in a few other places, I thought I should mention it here as well.




February 24, 2020 04:38 PM UTC


Anarcat

The CLA Denial-Of-Service attack

I just stumbled upon this weird mind bender this morning. I have found what I believe is a simple typo in the Ganeti documentation which has a trivial fix. But then, before I submitted a PR to fix it, I remembered that I had trouble getting stuff merged in Ganeti before. That's because they require a CLA (which is already annoying enough) that requires a Google account to sign (which is simply unacceptable). So that patch has been sitting there for months, unused and I haven't provided a patch for the other issue because of this very problem.

But that got me thinking. If I would want to mess things up real bad in a CLA-using project I don't like and:

  1. find a critical bug
  2. figure out a patch for the bug
  3. publish the patch in their issue tracker
  4. forever refuse to sign the CLA

Then my patch, and any derivative, would be unmergeable. If the bug is trivial enough, it might even be impossible to fix it without violating the letter of the law, or at least the process that project as adhered to.

Obviously, there's a flaw in that logic. A CLA is an agreement between a project and a (new) contributor. A project does not absolutely requires the contributor to sign the agreement to accept its contributions, in theory. It's the reverse: for the contributor to have their patch accepted, they need to accept the CLA. But the project could accept contributions without CLA without violating the law.

But it seems that projects sometimes end up doing a DOS on themselves by refusing perfectly fine contributions from drive-by contributors who don't have time to waste filling forms on all projects they stumble upon.

In the case of this typo, I could have submitted a patch, but because I didn't sign a CLA, again, the project couldn't have merged it without breaking their own rules, even if someone else submits the same patch, after agreeing to the CLA. So, in effect, I would have DOS'd the project by providing the patch, so I just opened an issue which strangely — and hopefully — isn't covered by the CLA.

Feels kind of stupid, really...

Instances of known self-imposed CLA DOS attacks:

February 24, 2020 03:32 PM UTC


Real Python

Introduction to Python SQL Libraries

All software applications interact with data, most commonly through a database management system (DBMS). Some programming languages come with modules that you can use to interact with a DBMS, while others require the use of third-party packages. In this tutorial, you’ll explore the different Python SQL libraries that you can use. You’ll develop a straightforward application to interact with SQLite, MySQL, and PostgreSQL databases.

In this tutorial, you’ll learn how to:

To get the most out of this tutorial, you should have knowledge of basic Python, SQL, and working with database management systems. You should also be able to download and import packages in Python and know how to install and run different database servers locally or remotely.

Free PDF Download: Python 3 Cheat Sheet

Understanding the Database Schema

In this tutorial, you’ll develop a very small database for a social media application. The database will consist of four tables:

  1. users
  2. posts
  3. comments
  4. likes

A high-level diagram of the database schema is shown below:

python-sql-database-schema

Both users and posts will have a one-to-many relationship since one user can like many posts. Similarly, one user can post many comments, and one post can also have multiple comments. So, both users and posts will also have one-to-many relationships with the comments table. This also applies to the likes table, so both users and posts will have a one-to-many relationship with the likes table.

Using Python SQL Libraries to Connect to a Database

Before you interact with any database through a Python SQL Library, you have to connect to that database. In this section, you’ll see how to connect to SQLite, MySQL, and PostgreSQL databases from within a Python application.

Note: You’ll need MySQL and PostgreSQL servers up and running before you execute the scripts in the MySQL and PostgreSQL database sections. For a quick intro on how to start a MySQL server, check out the MySQL section of Starting a Django Project. To learn how to create a database in PostgreSQL, check out the Setting Up a Database section of Preventing SQL Injection Attacks With Python.

It’s recommended that you create three different Python files, so you have one for each of the three databases. You’ll execute the script for each database in its corresponding file.

SQLite

SQLite is probably the most straightforward database to connect to with a Python application since you don’t need to install any external Python SQL modules to do so. By default, your Python installation contains a Python SQL library named sqlite3 that you can use to interact with an SQLite database.

What’s more, SQLite databases are serverless and self-contained, since they read and write data to a file. This means that, unlike with MySQL and PostgreSQL, you don’t even need to install and run an SQLite server to perform database operations!

Here’s how you use sqlite3 to connect to an SQLite database in Python:

 1 import sqlite3
 2 from sqlite3 import Error
 3 
 4 def create_connection(path):
 5     connection = None
 6     try:
 7         connection = sqlite3.connect(path)
 8         print("Connection to SQLite DB successful")
 9     except Error as e:
10         print(f"The error '{e}' occurred")
11 
12     return connection

Here’s how this code works:

sqlite3.connect(path) returns a connection object, which is in turn returned by create_connection(). This connection object can be used to execute queries on an SQLite database. The following script creates a connection to the SQLite database:

connection = create_connection("E:\\sm_app.sqlite")

Once you execute the above script, you’ll see that a database file sm_app.sqlite is created in the root directory. Note that you can change the location to match your setup.

MySQL

Unlike SQLite, there’s no default Python SQL module that you can use to connect to a MySQL database. Instead, you’ll need to install a Python SQL driver for MySQL in order to interact with a MySQL database from within a Python application. One such driver is mysql-connector-python. You can download this Python SQL module with pip:

$ pip install mysql-connector-python

Note that MySQL is a server-based database management system. One MySQL server can have multiple databases. Unlike SQLite, where creating a connection is tantamount to creating a database, a MySQL database has a two-step process for database creation:

  1. Make a connection to a MySQL server.
  2. Execute a separate query to create the database.

Define a function that connects to the MySQL database server and returns the connection object:

 1 import mysql.connector
 2 from mysql.connector import Error
 3 
 4 def create_connection(host_name, user_name, user_password):
 5     connection = None
 6     try:
 7         connection = mysql.connector.connect(
 8             host=host_name,
 9             user=user_name,
10             passwd=user_password
11         )
12         print("Connection to MySQL DB successful")
13     except Error as e:
14         print(f"The error '{e}' occurred")
15 
16     return connection
17 
18 connection = create_connection("localhost", "root", "")

In the above script, you define a function create_connection() that accepts three parameters:

  1. host_name
  2. user_name
  3. user_password

The mysql.connector Python SQL module contains a method .connect() that you use in line 7 to connect to a MySQL database server. Once the connection is established, the connection object is returned to the calling function. Finally, in line 18 you call create_connection() with the host name, username, and password.

So far, you’ve only established the connection. The database is not yet created. To do this, you’ll define another function create_database() that accepts two parameters:

  1. connection is the connection object to the database server that you want to interact with.
  2. query is the query that creates the database.

Here’s what this function looks like:

def create_database(connection, query):
    cursor = connection.cursor()
    try:
        cursor.execute(query)
        print("Database created successfully")
    except Error as e:
        print(f"The error '{e}' occurred")

To execute queries, you use the cursor object. The query to be executed is passed to cursor.execute() in string format.

Create a database named sm_app for your social media app in the MySQL database server:

create_database_query = "CREATE DATABASE sm_app"
create_database(connection, create_database_query)

Now you’ve created a database sm_app on the database server. However, the connection object returned by the create_connection() is connected to the MySQL database server. You need to connect to the sm_app database. To do so, you can modify create_connection() as follows:

 1 def create_connection(host_name, user_name, user_password, db_name):
 2     connection = None
 3     try:
 4         connection = mysql.connector.connect(
 5             host=host_name,
 6             user=user_name,
 7             passwd=user_password,
 8             database=db_name
 9         )
10         print("Connection to MySQL DB successful")
11     except Error as e:
12         print(f"The error '{e}' occurred")
13 
14     return connection

You can see in line 8 that create_connection() now accepts an additional parameter called db_name. This parameter specifies the name of the database that you want to connect to. You can pass in the name of the database you want to connect to when you call this function:

connection = create_connection("localhost", "root", "", "sm_app")

The above script successfully calls create_connection() and connects to the sm_app database.

PostgreSQL

Like MySQL, there’s no default Python SQL library that you can use to interact with a PostgreSQL database. Instead, you need to install a third-party Python SQL driver to interact with PostgreSQL. One such Python SQL driver for PostgreSQL is psycopg2. Execute the following command on your terminal to install the psycopg2 Python SQL module:

$ pip install psycopg2

Like with the SQLite and MySQL databases, you’ll define create_connection() to make a connection with your PostgreSQL database:

import psycopg2
from psycopg2 import OperationalError

def create_connection(db_name, db_user, db_password, db_host, db_port):
    connection = None
    try:
        connection = psycopg2.connect(
            database=db_name,
            user=db_user,
            password=db_password,
            host=db_host,
            port=db_port,
        )
        print("Connection to PostgreSQL DB successful")
    except OperationalError as e:
        print(f"The error '{e}' occurred")
    return connection

You use psycopg2.connect() to connect to a PostgreSQL server from within your Python application.

You can then use create_connection() to create a connection to a PostgreSQL database. First, you’ll make a connection with the default database postgres by using the following string:

connection = create_connection(
    "postgres", "postgres", "abc123", "127.0.0.1", "5432"
)

Next, you have to create the database sm_app inside the default postgres database. You can define a function to execute any SQL query in PostgreSQL. Below, you define create_database() to create a new database in the PostgreSQL database server:

def create_database(connection, query):
    connection.autocommit = True
    cursor = connection.cursor()
    try:
        cursor.execute(query)
        print("Query executed successfully")
    except OperationalError as e:
        print(f"The error '{e}' occurred")

create_database_query = "CREATE DATABASE sm_app"
create_database(connection, create_database_query)

Once you run the script above, you’ll see the sm_app database in your PostgreSQL database server.

Before you execute queries on the sm_app database, you need to connect to it:

connection = create_connection(
    "sm_app", "postgres", "abc123", "127.0.0.1", "5432"
)

Once you execute the above script, a connection will be established with the sm_app database located in the postgres database server. Here, 127.0.0.1 refers to the database server host IP address, and 5432 refers to the port number of the database server.

Creating Tables

In the previous section, you saw how to connect to SQLite, MySQL, and PostgreSQL database servers using different Python SQL libraries. You created the sm_app database on all three database servers. In this section, you’ll see how to create tables inside these three databases.

As discussed earlier, you’ll create four tables:

  1. users
  2. posts
  3. comments
  4. likes

You’ll start with SQLite.

SQLite

To execute queries in SQLite, use cursor.execute(). In this section, you’ll define a function execute_query() that uses this method. Your function will accept the connection object and a query string, which you’ll pass to cursor.execute().

.execute() can execute any query passed to it in the form of string. You’ll use this method to create tables in this section. In the upcoming sections, you’ll use this same method to execute update and delete queries as well.

Note: This script should be executed in the same file where you created the connection for your SQLite database.

Here’s your function definition:

def execute_query(connection, query):
    cursor = connection.cursor()
    try:
        cursor.execute(query)
        connection.commit()
        print("Query executed successfully")
    except Error as e:
        print(f"The error '{e}' occurred")

This code tries to execute the given query and prints an error message if necessary.

Next, write your query:

create_users_table = """
CREATE TABLE IF NOT EXISTS users (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  name TEXT NOT NULL,
  age INTEGER,
  gender TEXT,
  nationality TEXT
);
"""

This says to create a table users with the following five columns:

  1. id
  2. name
  3. age
  4. gender
  5. nationality

Finally, you’ll call execute_query() to create the table. You’ll pass in the connection object that you created in the previous section, along with the create_users_table string that contains the create table query:

execute_query(connection, create_users_table)  

The following query is used to create the posts table:

create_posts_table = """
CREATE TABLE IF NOT EXISTS posts(
  id INTEGER PRIMARY KEY AUTOINCREMENT, 
  title TEXT NOT NULL, 
  description TEXT NOT NULL, 
  user_id INTEGER NOT NULL, 
  FOREIGN KEY (user_id) REFERENCES users (id)
);
"""

Since there’s a one-to-many relationship between users and posts, you can see a foreign key user_id in the posts table that references the id column in the users table. Execute the following script to create the posts table:

execute_query(connection, create_posts_table)

Finally, you can create the comments and likes tables with the following script:

create_comments_table = """
CREATE TABLE IF NOT EXISTS comments (
  id INTEGER PRIMARY KEY AUTOINCREMENT, 
  text TEXT NOT NULL, 
  user_id INTEGER NOT NULL, 
  post_id INTEGER NOT NULL, 
  FOREIGN KEY (user_id) REFERENCES users (id) FOREIGN KEY (post_id) REFERENCES posts (id)
);
"""

create_likes_table = """
CREATE TABLE IF NOT EXISTS likes (
  id INTEGER PRIMARY KEY AUTOINCREMENT, 
  user_id INTEGER NOT NULL, 
  post_id integer NOT NULL, 
  FOREIGN KEY (user_id) REFERENCES users (id) FOREIGN KEY (post_id) REFERENCES posts (id)
);
"""

execute_query(connection, create_comments_table)  
execute_query(connection, create_likes_table)            

You can see that creating tables in SQLite is very similar to using raw SQL. All you have to do is store the query in a string variable and then pass that variable to cursor.execute().

MySQL

You’ll use the mysql-connector-python Python SQL module to create tables in MySQL. Just like with SQLite, you need to pass your query to cursor.execute(), which is returned by calling .cursor() on the connection object. You can create another function execute_query() that accepts the connection and query string:

 1 def execute_query(connection, query):
 2     cursor = connection.cursor()
 3     try:
 4         cursor.execute(query)
 5         connection.commit()
 6         print("Query executed successfully")
 7     except Error as e:
 8         print(f"The error '{e}' occurred")

In line 4, you pass the query to cursor.execute().

Now you can create your users table using this function:

create_users_table = """
CREATE TABLE IF NOT EXISTS users (
  id INT AUTO_INCREMENT, 
  name TEXT NOT NULL, 
  age INT, 
  gender TEXT, 
  nationality TEXT, 
  PRIMARY KEY (id)
) ENGINE = InnoDB
"""

execute_query(connection, create_users_table)

The query for implementing the foreign key relation is slightly different in MySQL as compared to SQLite. What’s more, MySQL uses the AUTO_INCREMENT keyword (compared to the SQLite AUTOINCREMENT keyword) to create columns where the values are automatically incremented when new records are inserted.

The following script creates the posts table, which contains a foreign key user_id that references the id column of the users table:

create_posts_table = """
CREATE TABLE IF NOT EXISTS posts (
  id INT AUTO_INCREMENT, 
  title TEXT NOT NULL, 
  description TEXT NOT NULL, 
  user_id INTEGER NOT NULL, 
  FOREIGN KEY fk_user_id (user_id) REFERENCES users(id), 
  PRIMARY KEY (id)
) ENGINE = InnoDB
"""

execute_query(connection, create_posts_table)

Similarly, to create the comments and likes tables, you can pass the corresponding CREATE queries to execute_query().

PostgreSQL

Like with SQLite and MySQL databases, the connection object that’s returned by psycopg2.connect() contains a cursor object. You can use cursor.execute() to execute Python SQL queries on your PostgreSQL database.

Define a function execute_query():

def execute_query(connection, query):
    connection.autocommit = True
    cursor = connection.cursor()
    try:
        cursor.execute(query)
        print("Query executed successfully")
    except OperationalError as e:
        print(f"The error '{e}' occurred")

You can use this function to create tables, insert records, modify records, and delete records in your PostgreSQL database.

Now create the users table inside the sm_app database:

create_users_table = """
CREATE TABLE IF NOT EXISTS users (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL, 
  age INTEGER,
  gender TEXT,
  nationality TEXT
)
"""

execute_query(connection, create_users_table)

You can see that the query to create the users table in PostgreSQL is slightly different than SQLite and MySQL. Here, the keyword SERIAL is used to create columns that increment automatically. Recall that MySQL uses the keyword AUTO_INCREMENT.

In addition, foreign key referencing is also specified differently, as shown in the following script that creates the posts table:

create_posts_table = """
CREATE TABLE IF NOT EXISTS posts (
  id SERIAL PRIMARY KEY, 
  title TEXT NOT NULL, 
  description TEXT NOT NULL, 
  user_id INTEGER REFERENCES users(id)
)
"""

execute_query(connection, create_posts_table)

To create the comments table, you’ll have to write a CREATE query for the comments table and pass it to execute_query(). The process for creating the likes table is the same. You only have to modify the CREATE query to create the likes table instead of the comments table.

Inserting Records

In the previous section, you saw how to create tables in your SQLite, MySQL, and PostgreSQL databases by using different Python SQL modules. In this section, you’ll see how to insert records into your tables.

SQLite

To insert records into your SQLite database, you can use the same execute_query() function that you used to create tables. First, you have to store your INSERT INTO query in a string. Then, you can pass the connection object and query string to execute_query(). Let’s insert five records into the users table:

create_users = """
INSERT INTO
  users (name, age, gender, nationality)
VALUES
  ('James', 25, 'male', 'USA'),
  ('Leila', 32, 'female', 'France'),
  ('Brigitte', 35, 'female', 'England'),
  ('Mike', 40, 'male', 'Denmark'),
  ('Elizabeth', 21, 'female', 'Canada');
"""

execute_query(connection, create_users)   

Since you set the id column to auto-increment, you don’t need to specify the value of the id column for these users. The users table will auto-populate these five records with id values from 1 to 5.

Now insert six records into the posts table:

create_posts = """
INSERT INTO
  posts (title, description, user_id)
VALUES
  ("Happy", "I am feeling very happy today", 1),
  ("Hot Weather", "The weather is very hot today", 2),
  ("Help", "I need some help with my work", 2),
  ("Great News", "I am getting married", 1),
  ("Interesting Game", "It was a fantastic game of tennis", 5),
  ("Party", "Anyone up for a late-night party today?", 3);
"""

execute_query(connection, create_posts)  

It’s important to mention that the user_id column of the posts table is a foreign key that references the id column of the users table. This means that the user_id column must contain a value that already exists in the id column of the users table. If it doesn’t exist, then you’ll see an error.

Similarly, the following script inserts records into the comments and likes tables:

create_comments = """
INSERT INTO
  comments (text, user_id, post_id)
VALUES
  ('Count me in', 1, 6),
  ('What sort of help?', 5, 3),
  ('Congrats buddy', 2, 4),
  ('I was rooting for Nadal though', 4, 5),
  ('Help with your thesis?', 2, 3),
  ('Many congratulations', 5, 4);
"""

create_likes = """
INSERT INTO
  likes (user_id, post_id)
VALUES
  (1, 6),
  (2, 3),
  (1, 5),
  (5, 4),
  (2, 4),
  (4, 2),
  (3, 6);
"""

execute_query(connection, create_comments)
execute_query(connection, create_likes)  

In both cases, you store your INSERT INTO query as a string and execute it with execute_query().

MySQL

There are two ways to insert records into MySQL databases from a Python application. The first approach is similar to SQLite. You can store the INSERT INTO query in a string and then use cursor.execute() to insert records.

Earlier, you defined a wrapper function execute_query() that you used to insert records. You can use this same function now to insert records into your MySQL table. The following script inserts records into the users table using execute_query():

create_users = """
INSERT INTO
  `users` (`name`, `age`, `gender`, `nationality`)
VALUES
  ('James', 25, 'male', 'USA'),
  ('Leila', 32, 'female', 'France'),
  ('Brigitte', 35, 'female', 'England'),
  ('Mike', 40, 'male', 'Denmark'),
  ('Elizabeth', 21, 'female', 'Canada');
"""

execute_query(connection, create_users)  

The second approach uses cursor.executemany(), which accepts two parameters:

  1. The query string containing placeholders for the records to be inserted
  2. The list of records that you want to insert

Look at the following example, which inserts two records into the likes table:

sql = "INSERT INTO likes ( user_id, post_id ) VALUES ( %s, %s )"
val = [(4, 5), (3, 4)]

cursor = connection.cursor()
cursor.executemany(sql, val)
connection.commit()

It’s up to you which approach you choose to insert records into your MySQL table. If you’re an expert in SQL, then you can use .execute(). If you’re not much familiar with SQL, then it may be more straightforward for you to use .executemany(). With either of the two approaches, you can successfully insert records into the posts, comments, and likes tables.

PostgreSQL

In the previous section, you saw two approaches for inserting records into SQLite database tables. The first uses an SQL string query, and the second uses .executemany(). psycopg2 follows this second approach, though .execute() is used to execute a placeholder-based query.

You pass the SQL query with the placeholders and the list of records to .execute(). Each record in the list will be a tuple, where tuple values correspond to the column values in the database table. Here’s how you can insert user records into the users table in a PostgreSQL database:

users = [
    ("James", 25, "male", "USA"),
    ("Leila", 32, "female", "France"),
    ("Brigitte", 35, "female", "England"),
    ("Mike", 40, "male", "Denmark"),
    ("Elizabeth", 21, "female", "Canada"),
]

user_records = ", ".join(["%s"] * len(users))

insert_query = (
    f"INSERT INTO users (name, age, gender, nationality) VALUES {user_records}"
)

connection.autocommit = True
cursor = connection.cursor()
cursor.execute(insert_query, users)

The script above creates a list users that contains five user records in the form of tuples. Next, you create a placeholder string with five placeholder elements (%s) that correspond to the five user records. The placeholder string is concatenated with the query that inserts records into the users table. Finally, the query string and the user records are passed to .execute(). The above script successfully inserts five records into the users table.

Take a look at another example of inserting records into a PostgreSQL table. The following script inserts records into the posts table:

posts = [
    ("Happy", "I am feeling very happy today", 1),
    ("Hot Weather", "The weather is very hot today", 2),
    ("Help", "I need some help with my work", 2),
    ("Great News", "I am getting married", 1),
    ("Interesting Game", "It was a fantastic game of tennis", 5),
    ("Party", "Anyone up for a late-night party today?", 3),
]

post_records = ", ".join(["%s"] * len(posts))

insert_query = (
    f"INSERT INTO posts (title, description, user_id) VALUES {post_records}"
)

connection.autocommit = True
cursor = connection.cursor()
cursor.execute(insert_query, posts)

You can insert records into the comments and likes tables with the same approach.

Selecting Records

In this section, you’ll see how to select records from database tables using the different Python SQL modules. In particular, you’ll see how to perform SELECT queries on your SQLite, MySQL, and PostgreSQL databases.

SQLite

To select records using SQLite, you can again use cursor.execute(). However, after you’ve done this, you’ll need to call .fetchall(). This method returns a list of tuples where each tuple is mapped to the corresponding row in the retrieved records.

To simplify the process, you can create a function execute_read_query():

def execute_read_query(connection, query):
    cursor = connection.cursor()
    result = None
    try:
        cursor.execute(query)
        result = cursor.fetchall()
        return result
    except Error as e:
        print(f"The error '{e}' occurred")

This function accepts the connection object and the SELECT query and returns the selected record.

SELECT

Let’s now select all the records from the users table:

select_users = "SELECT * from users"
users = execute_read_query(connection, select_users)

for user in users:
    print(user)

In the above script, the SELECT query selects all the users from the users table. This is passed to the execute_read_query(), which returns all the records from the users table. The records are then traversed and printed to the console.

Note: It’s not recommended to use SELECT * on large tables since it can result in a large number of I/O operations that increase the network traffic.

The output of the above query looks like this:

(1, 'James', 25, 'male', 'USA')
(2, 'Leila', 32, 'female', 'France')
(3, 'Brigitte', 35, 'female', 'England')
(4, 'Mike', 40, 'male', 'Denmark')
(5, 'Elizabeth', 21, 'female', 'Canada')

In the same way, you can retrieve all the records from the posts table with the below script:

select_posts = "SELECT * FROM posts"
posts = execute_read_query(connection, select_posts)

for post in posts:
    print(post)

The output looks like this:

(1, 'Happy', 'I am feeling very happy today', 1)
(2, 'Hot Weather', 'The weather is very hot today', 2)
(3, 'Help', 'I need some help with my work', 2)
(4, 'Great News', 'I am getting married', 1)
(5, 'Interesting Game', 'It was a fantastic game of tennis', 5)
(6, 'Party', 'Anyone up for a late-night party today?', 3)

The result shows all the records in the posts table.

JOIN

You can also execute complex queries involving JOIN operations to retrieve data from two related tables. For instance, the following script returns the user ids and names, along with the description of the posts that these users posted:

select_users_posts = """
SELECT
  users.id,
  users.name,
  posts.description
FROM
  posts
  INNER JOIN users ON users.id = posts.user_id
"""

users_posts = execute_read_query(connection, select_users_posts)

for users_post in users_posts:
    print(users_post)

Here’s the output:

(1, 'James', 'I am feeling very happy today')
(2, 'Leila', 'The weather is very hot today')
(2, 'Leila', 'I need some help with my work')
(1, 'James', 'I am getting married')
(5, 'Elizabeth', 'It was a fantastic game of tennis')
(3, 'Brigitte', 'Anyone up for a late night party today?')

You can also select data from three related tables by implementing multiple JOIN operators. The following script returns all posts, along with the comments on the posts and the names of the users who posted the comments:

select_posts_comments_users = """
SELECT
  posts.description as post,
  text as comment,
  name
FROM
  posts
  INNER JOIN comments ON posts.id = comments.post_id
  INNER JOIN users ON users.id = comments.user_id
"""

posts_comments_users = execute_read_query(
    connection, select_posts_comments_users
)

for posts_comments_user in posts_comments_users:
    print(posts_comments_user)

The output looks like this:

('Anyone up for a late night party today?', 'Count me in', 'James')
('I need some help with my work', 'What sort of help?', 'Elizabeth')
('I am getting married', 'Congrats buddy', 'Leila')
('It was a fantastic game of tennis', 'I was rooting for Nadal though', 'Mike')
('I need some help with my work', 'Help with your thesis?', 'Leila')
('I am getting married', 'Many congratulations', 'Elizabeth')

You can see from the output that the column names are not being returned by .fetchall(). To return column names, you can use the .description attribute of the cursor object. For instance, the following list returns all the column names for the above query:

cursor = connection.cursor()
cursor.execute(select_posts_comments_users)
cursor.fetchall()

column_names = [description[0] for description in cursor.description]
print(column_names)

The output looks like this:

['post', 'comment', 'name']

You can see the names of the columns for the given query.

WHERE

Now you’ll execute a SELECT query that returns the post, along with the total number of likes that the post received:

select_post_likes = """
SELECT
  description as Post,
  COUNT(likes.id) as Likes
FROM
  likes,
  posts
WHERE
  posts.id = likes.post_id
GROUP BY
  likes.post_id
"""

post_likes = execute_read_query(connection, select_post_likes)

for post_like in post_likes:
    print(post_like)

The output is as follows:

('The weather is very hot today', 1)
('I need some help with my work', 1)
('I am getting married', 2)
('It was a fantastic game of tennis', 1)
('Anyone up for a late night party today?', 2)

By using a WHERE clause, you’re able to return more specific results.

MySQL

The process of selecting records in MySQL is absolutely identical to selecting records in SQLite. You can use cursor.execute() followed by .fetchall(). The following script creates a wrapper function execute_read_query() that you can use to select records:

def execute_read_query(connection, query):
    cursor = connection.cursor()
    result = None
    try:
        cursor.execute(query)
        result = cursor.fetchall()
        return result
    except Error as e:
        print(f"The error '{e}' occurred")

Now select all the records from the users table:

select_users = "SELECT * FROM users"
users = execute_read_query(connection, select_users)

for user in users:
    print(user)

The output will be similar to what you saw with SQLite.

PostgreSQL

The process of selecting records from a PostgreSQL table with the psycopg2 Python SQL module is similar to what you did with SQLite and MySQL. Again, you’ll use cursor.execute() followed by .fetchall() to select records from your PostgreSQL table. The following script selects all the records from the users table and prints them to the console:

def execute_read_query(connection, query):
    cursor = connection.cursor()
    result = None
    try:
        cursor.execute(query)
        result = cursor.fetchall()
        return result
    except OperationalError as e:
        print(f"The error '{e}' occurred")

select_users = "SELECT * FROM users"
users = execute_read_query(connection, select_users)

for user in users:
    print(user)

Again, the output will be similar to what you’ve seen before.

Updating Table Records

In the last section, you saw how to select records from SQLite, MySQL, and PostgreSQL databases. In this section, you’ll cover the process for updating records using the Python SQL libraries for SQLite, PostgresSQL, and MySQL.

SQLite

Updating records in SQLite is pretty straightforward. You can again make use of execute_query(). As an example, you can update the description of the post with an id of 2. First, SELECT the description of this post:

select_post_description = "SELECT description FROM posts WHERE id = 2"

post_description = execute_read_query(connection, select_post_description)

for description in post_description:
    print(description)

You should see the following output:

('The weather is very hot today',)

The following script updates the description:

update_post_description = """
UPDATE
  posts
SET
  description = "The weather has become pleasant now"
WHERE
  id = 2
"""

execute_query(connection, update_post_description)

Now, if you execute the SELECT query again, you should see the following result:

('The weather has become pleasant now',)

The output has been updated.

MySQL

The process of updating records in MySQL with mysql-connector-python is also a carbon copy of the sqlite3 Python SQL module. You need to pass the string query to cursor.execute(). For example, the following script updates the description of the post with an id of 2:

update_post_description = """
UPDATE
  posts
SET
  description = "The weather has become pleasant now"
WHERE
  id = 2
"""

execute_query(connection,  update_post_description)

Again, you’ve used your wrapper function execute_query() to update the post description.

PostgreSQL

The update query for PostgreSQL is similar to what you’ve seen with SQLite and MySQL. You can use the above scripts to update records in your PostgreSQL table.

Deleting Table Records

In this section, you’ll see how to delete table records using the Python SQL modules for SQLite, MySQL, and PostgreSQL databases. The process of deleting records is uniform for all three databases since the DELETE query for the three databases is the same.

SQLite

You can again use execute_query() to delete records from YOUR SQLite database. All you have to do is pass the connection object and the string query for the record you want to delete to execute_query(). Then, execute_query() will create a cursor object using the connection and pass the string query to cursor.execute(), which will delete the records.

As an example, try to delete the comment with an id of 5:

delete_comment = "DELETE FROM comments WHERE id = 5"
execute_query(connection, delete_comment)

Now, if you select all the records from the comments table, you’ll see that the fifth comment has been deleted.

MySQL

The process for deletion in MySQL is also similar to SQLite, as shown in the following example:

delete_comment = "DELETE FROM comments WHERE id = 2"
execute_query(connection, delete_comment)

Here, you delete the second comment from the sm_app database’s comments table in your MySQL database server.

PostgreSQL

The delete query for PostgreSQL is also similar to SQLite and MySQL. You can write a delete query string by using the DELETE keyword and then passing the query and the connection object to execute_query(). This will delete the specified records from your PostgreSQL database.

Conclusion

In this tutorial, you’ve learned how to use three common Python SQL libraries. sqlite3, mysql-connector-python, and psycopg2 allow you to connect a Python application to SQLite, MySQL, and PostgreSQL databases, respectively.

Now you can:

However, this is just the tip of the iceberg! There are also Python SQL libraries for object-relational mapping, such as SQLAlchemy and Django ORM, that automate the task of database interaction in Python. You’ll learn more about these libraries in other tutorials in our Python databases section.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

February 24, 2020 02:00 PM UTC


Roberto Alsina

Episodio 23: Androides Linuxeros

Demostrando Anbox, un paquete para usar aplicaciones Android en Linux!

February 24, 2020 01:17 PM UTC


pgcli

Welcome IRedis

We are happy to welcome IRedis to the dbcli org.

IRedis is A Terminal Client for Redis with AutoCompletion and Syntax Highlighting.

IRedis is written in python using the wonderful prompt-toolkit library. It is cross-platform compatible and it is tested on Linux, MacOS and Windows.

IRedis ships with a lot of user-friendly features. One new innovative feature is the ability to pipe the output of a redis command to a unix command. Here's an example of piping JSON to jq:

The project is lead by 赖信涛.

IRedis is the latest addition to the DBCLI suite of tools.

February 24, 2020 08:00 AM UTC


Mike Driscoll

PyDev of the Week: Hameer Abbasi

This week we welcome Hameer Abbasi as our PyDev of the Week! Hameer works on the PyData Sparse project. You can check out what else Hameer is working on over on Github. Let’s take some time to get to know him better!

Can you tell us a little about yourself (hobbies, education, etc):

My hobby is, and has been for a while, scientific computing in general, the ecosystem and how to make it better. I’m lucky and grateful to have found a job in that same field, even though my formal education wasn’t in either Mathematics or Computer Science. Moving over to my education, I completed my Bachelors in Electrical (Telecommunications) Engineering from National University of Sciences and Technology, Pakistan in July 2014. After being a professional for a year at LMK Resources, Pakistan until September, 2015, I moved to Germany and completed my Masters in Information and Communication Engineering from Technische Universität Darmstadt (English: Technical University of Darmstadt) in October, 2015. I started with Quansight as a contractor then, and I’m continuing that to date.

Why did you start using Python?

I was doing a Hilfswissenschaftler job (sort of like a Research Assistant in the USA), and there I was presented the problem of scaling a sparse system to a larger space. I discovered the PyData/Sparse project back then (it was in Matthew Rocklin’s personal repository at the time), and was immediately fascinated by the idea of computational gains to be had if one moved to a sparse representation. I’m now the maintainer for that project, and I’m grateful I chose that path, as it landed me a talk at SciPy 2018 and a client in the form of Quansight.

What other programming languages do you know and which is your favorite?

I’ve dabbled in a lot of programming languages over the years. Started with Visual Basic 2000, moved on to Visual Basic .NET, HTML, Java, Javascript, C++. The ones I really feel I know, though are Python and C#, because I have hands on experience on real projects with these. I like Rust’s “do it right the first time” model.

My favourite of all these to work with is probably C#, because of the excellent tooling around it, but as a language I like Python more.

What projects are you working on now?

I’m working on a number of client projects with Quansight, along with others that are in their Labs division. These include uarray, a backend-dispatch system with various utilities, unumpy, a “backend-agnostic” version of NumPy, and udiff, an automatic differentiation library built on top of unumpy. I also recently started some research on PyData/Sparse again. I’d like to talk about the uarray family a bit more — It’s awesome that you can just take a piece of code, change out a with statement and/or an import, and watch the magic.

Which Python libraries are your favorite (core or 3rd party)?

Probably XND, it’s really well engineered and shows a lot of potential. Now if there was an active maintainer on it…

How did you get started with the PyData/Sparse project?

I, honestly, was slacking off from my thesis working on what interested me, the idea of huge computational gain just from moving to a sparse structure.

What makes PyData/Sparse great?

Well, try it and find out. ????

Can you describe any current challenges that you see for Python in data science?

The need for paid maintainers on projects that have core infrastructure. Travis Oliphant (CEO at Quansight and OpenTeams) has talked about this in length.

Thanks for doing the interview, Hameer!

The post PyDev of the Week: Hameer Abbasi appeared first on The Mouse Vs. The Python.

February 24, 2020 06:05 AM UTC


tryexceptpass

Episode 4 - 7 Practices for High Quality Maintainable Code

February 24, 2020 04:00 AM UTC

February 23, 2020


Will McGugan

Better Python tracebacks with Rich

One of my goals in writing Rich was to render really nice Python tracebacks. And now that feature has landed.

I've never found Python tracebacks to be a great debugging aid beyond telling me what the exception was, and where it occurred. In a recent update to Rich, I've tried to refresh the humble traceback to give enough context to diagnose errors before switching back to the editor.

Here's an example of a rich traceback:

© 2020 Will McGugan

Rich traceback on OSX

There is highlighting to help pick out filename, line, and function, etc. There's also a snippet of code for each stack frame, with line numbers and syntax highlighting. It's configurable, but I find that 7 lines of code are enough to make it relatable to the file in my editor, and give me a better understanding of the context that lead to the exception.

Here's how tracebacks render on Windows:

© 2020 Will McGugan

Rich traceback on Windows terminal

For reference, here's the same traceback rendered in a more traditional way:

© 2020 Will McGugan

Just a regular old traceback

To try out rich tracebacks, install the exception handler as follows:

from rich.traceback import install
install()

Now any uncaught exceptions will rendered by Rich. See the docs for details.

Rich is quite usable as a library now, but is still in active development. If you have any ideas on how to improve rich tracebacks or any other aspect of the library, let me know.

February 23, 2020 09:22 PM UTC


Hynek Schlawack

Python in Production

I’m missing a key part from the public Python discourse and I would like to help to change that.

February 23, 2020 04:45 PM UTC