skip to navigation
skip to content

Planet Python

Last update: March 21, 2023 04:42 AM UTC

March 21, 2023


Codementor

Python Position and Keyword Only Arguments

What does / and * mean in a Python function definition mean

March 21, 2023 04:14 AM UTC

March 20, 2023


TestDriven.io

Django Performance Optimization Tips

This article looks at where potential performance issues can occur in a Django application and how to address them in order to speed up your app.

March 20, 2023 10:28 PM UTC


Django Weblog

Want to host DjangoCon Europe 2024?

DjangoCon Europe 2023 will be held May 29th-June 2nd in Edinburgh, Scotland, but we're already looking ahead to next year's conference. Could your town - or your football stadium, circus tent, private island or city hall - host this wonderful community event?

Hosting a DjangoCon is an ambitious undertaking. It's hard work, but each year it has been successfully run by a team of community volunteers, not all of whom have had previous experience - more important is enthusiasm, organisational skills, the ability to plan and manage budgets, time and people - and plenty of time to invest in the project.

You'll find plenty of support on offer from previous DjangoCon organisers, so you won't be on your own.

How to apply

If you're interested, we'd love to hear from you. Following the established tradition, the selected hosts will be announced at this year's DjangoCon by last year's organiser but must fall more than one month from DjangoCon US and PyCon US, and EuroPython in the same calendar year. In order to make the announcement at DjangoCon Europe we will need to receive your proposal by May 10.

The more detailed and complete your proposal, the better. Things you should consider, and that we'd like to know about, are:

We'd like to see:

Email you proposals to djangocon-europe-2024-proposals at djangoproject dot com. They will all help show that your plans are serious and thorough and that you have the organisational capacity to make it a success.

We will be hosting a virtual informational session for those that are interested or may be interested in organising a DjangoCon. Please complete indicate your interest here.

If you have any questions or concerns about organising a DjangoCon you can Just drop us a line.

March 20, 2023 08:21 PM UTC


Python Morsels

What is a context manager?

Context managers power Python's with blocks. They sandwich a code block between enter code and exit code. They're most often used for reusing common cleanup/teardown functionality.

Table of contents

  1. Files opened with with close automatically
  2. Context managers work in with statements
  3. Context managers are like a try-finally block
  4. Life without a context manager
  5. Using a with block requires a context manager

Files opened with with close automatically

Context managers are objects that can be used in Python's with statements.

You'll often see with statements used when working with files in Python.

This code opens a file, uses the f variable to point to the file object, reads from the file, and then closes the file:

>>> with open("my_file.txt") as f:
...     contents = f.read()
...

Notice that we didn't explicitly tell Python to close our file.

But the file did close:

>>> f.closed
True

The file closed automatically when the with block was exited.

Context managers work in with statements

Any object that can be …

Read the full article: https://www.pythonmorsels.com/what-is-a-context-manager/

March 20, 2023 03:00 PM UTC


Real Python

Executing Python Scripts With a Shebang

When you read someone else’s Python code, you frequently see a mysterious line, which always appears at the top of the file, starting with the distinctive shebang (#!) sequence. It looks like a not-so-useful comment, but other than that, it doesn’t resemble anything else you’ve learned about Python, making you wonder what that is and why it’s there. As if that wasn’t enough to confuse you, the shebang line only appears in some Python modules.

In this tutorial, you’ll:

  • Learn what a shebang is
  • Decide when to include the shebang in Python scripts
  • Define the shebang in a portable way across systems
  • Pass arguments to the command defined in a shebang
  • Know the shebang’s limitations and some of its alternatives
  • Execute scripts through a custom interpreter written in Python

To proceed, you should have basic familiarity with the command line and know how to run Python scripts from it. You can also download the supporting materials for this tutorial to follow along with the code examples:

Free Sample Code: Click here to download the free sample code that you’ll use to execute Python scripts with a shebang.

What’s a Shebang, and When Should You Use It?

In short, a shebang is a special kind of comment that you may include in your source code to tell the operating system’s shell where to find the interpreter for the rest of the file:

#!/usr/bin/python3

print("Hello, World!")

If you’re using a shebang, it must appear on the first line in your script, and it has to start with a hash sign (#) followed by an exclamation mark (!), colloquially known as the bang, hence the name shebang. The choice of the hash sign to begin this special sequence of characters wasn’t accidental, as many scripting languages use it for inline comments.

You should make sure you don’t put any other comments before the shebang line if you want it to work correctly, or else it won’t be recognized! After the exclamation mark, specify an absolute path to the relevant code interpreter, such as Python. Providing a relative path will have no effect, unfortunately.

Note: The shebang is only recognized by shells, such as Z shell or Bash, running on Unix-like operating systems, including macOS and Linux distributions. It bears no particular meaning in the Windows terminal, which treats the shebang as an ordinary comment by ignoring it.

You can get the shebang to work on Windows by installing the Windows Subsystem for Linux (WSL) that comes with a Unix shell. Alternatively, Windows lets you make a global file association between a file extension like .py and a program, such as the Python interpreter, to achieve a similar effect.

It’s not uncommon to combine a shebang with the name-main idiom, which prevents the main block of code from running when someone imports the file from another module:

#!/usr/bin/python3

if __name__ == "__main__":
    print("Hello, World!")

With this conditional statement, Python will call the print() function only when you run this module directly as a script—for example, by providing its path to the Python interpreter:

$ python3 /path/to/your/script.py
Hello, World!

As long as the script’s content starts with a correctly defined shebang line and your system user has permission to execute the corresponding file, you can omit the python3 command to run that script:

$ /path/to/your/script.py
Hello, World!

A shebang is only relevant to runnable scripts that you wish to execute without explicitly specifying the program to run them through. You wouldn’t typically put a shebang in a Python module that only contains function and class definitions meant for importing from other modules. Therefore, use the shebang when you don’t want to prefix the command that runs your Python script with python or python3.

Note: In the old days of Python, the shebang line would sometimes appear alongside another specially formatted comment described in PEP 263:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

if __name__ == "__main__":
    print("Grüß Gott")

The highlighted line used to be necessary to tell the interpreter which character encoding it should use to read your source code correctly, as Python defaulted to ASCII. However, this was only important when you directly embedded non-Latin characters, such as ü or ß, in your code.

This special comment is irrelevant today because modern Python versions use the universal UTF-8 encoding, which can handle such characters with ease. Nevertheless, it’s always preferable to replace tricky characters with their encoded representations using Unicode literals:

>>>
>>> "Grüß Gott".encode("unicode_escape")
b'Gr\\xfc\\xdf Gott'

Your foreign colleagues who have different keyboard layouts will thank you for that!

Now that you have a high-level understanding of what a shebang is and when to use it, you’re ready to explore it in more detail. In the next section, you’ll take a closer look at how it works.

How Does a Shebang Work?

Normally, to run a program in the terminal, you must provide the full path to a particular binary executable or the name of a command present in one of the directories listed on the PATH environment variable. One or more command-line arguments may follow this path or command:

$ /usr/bin/python3 -c 'print("Hello, World!")'
Hello, World!

$ python3 -c 'print("Hello, World!")'
Hello, World!

Here, you run the Python interpreter in a non-interactive mode against a one-liner program passed through the -c option. In the first case, you provide an absolute path to python3, while in the second case, you rely on the fact that the parent folder, /usr/bin/, is included on the search path by default. Your shell can find the Python executable, even if you don’t provide the full path, by looking through the directories on the PATH variable.

Note: If multiple commands with the same name exist in more than one directory listed on the PATH variable, then your shell will execute the first it can find. As a result, the outcome of running a command without explicitly specifying the corresponding path may sometimes be surprising. It’ll depend on the order of directories in your PATH variable. However, this can be useful, as you’ll find out later.

Read the full article at https://realpython.com/python-shebang/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

March 20, 2023 02:00 PM UTC


Mike Driscoll

PyDev of the Week: Pierre Raybaut

Today we welcome Pierre Raybaut (@pierreraybaut) as our PyDev of the Week! Pierre is the creator of Spyder, the Scientific Python IDE. Pierre is also the creator of pythonxy and WinPython.

You can see what other projects Pierre is part of over on Pierre’s GitHub Profile.

Now let’s spend some time getting to know Pierre better!

 

 

Pierre RaybautPierre Raybaut

Can you tell us a little about yourself (hobbies, education, etc):

The first code I wrote was an Applesoft BASIC program, on an Apple //e computer… I was 10 years old. Since then I always managed to bring computers in everything I did, at home or at work. As I was an amateur astronomer and was also very fond of physics in general, I chose to follow scientific studies. A few years later, I specialized in optics and photonics and graduated from Institut d’Optique Graduate School, which is now part of Université Paris-Saclay. I then pursued a PhD in the field of femtosecond lasers. Although it was mainly experimental physics, I had the opportunity to develop a code for simulating regenerative amplification in ultra-short pulse lasers; I learned recently that this code is still used today! After my PhD, I worked as a research engineer at THALES Avionics (on developing innovative head-up displays for aircrafts).Then, in 2007, I joined the French Alternative Energies and Atomic Energy Commission (CEA) where I was hired as leading software developer for applications involving image and signal processing as well as scientific instruments control. In 2012, I was given a project management position for the Laser Mégajoule timing and fiducial system development. Four years later, I was appointed head of a research laboratory. Lastly, in 2018 I had the opportunity to join Codra, an industrial software company, as a Project Director. In addition to this position, I am currently the pre-sales manager for the department of engineering at Codra. And of course, I’m also involved in open-source software development since 2008.

Why did you start using Python?

I started using Python in 2008, after a long and meticulous evaluation of various solutions that may fit my needs. Since early 2007 I was part of a research team at CEA. When I joined this team in 2007, every processing and acquisition software was written using commercial software. Some applications were getting huge and complex with a lot of GUIs for editing tons of parameters or visualizing results. Robustness was the main concern, therefore I chose Python since it was providing all the necessary tools for our research work (interactive computing and scientific data plotting) as well as the general-purpose libraries for building stable and robust applications. In 2008, when I started using and promoting Python amongst my colleagues, a piece of the puzzle was still missing: Python had no scientific-oriented IDE! That’s why during my vacations I began coding some tools for filling gaps in Python ecosystem, using Qt GUIs. After writing a variable explorer GUI that could be used directly from a Python interpreter to interact with current namespace, I wrote a Qt-based Python code editor, then a Qt-based Python console… and so on. After a few weeks only, this was done! This ultimately resulted in Spyder (Scientific PYthon Development EnviRonment), a scientific Python IDE that I first released to the public in September 2009: Python was finally a viable alternative to scientific commercial software. Today, thanks to a development team lead by Carlos Cordoba since 2012, Spyder is widely used for data processing and visualization with Python (est. 500,000 downloads/day).

What other programming languages do you know and which is your favorite?

As you know, Python is quite open to other languages. Moreover, when using Python for signal or image processing, it is sometimes necessary to write extensions in C/C++ (or even Fortran) for performance reasons. For example, writing Fortran code for image processing is quite fun, because there is absolutely no interface code to take care of. Cython is also an elegant solution as it allows a progressive optimization of a pure Python algorithm. Finally, on some projects implemented at Codra, I had to make adjustments in code written in C#. I also made some investigations on projects using other languages (Javascript, TypeScript, …). So I’ve been playing with a few languages but Python is the one that gave me most satisfaction, especially when trying to write clean code thanks to quality-related tools like Black, isort or Pylint.

What projects are you working on now?

At Codra, I’m involved in a lot of projects as a Project Director (or technical expert), in various fields like supervisory systems, data acquisition, multi-protocol gateways, data processing, data visualization, etc. From time to time, I even play the role of Project Manager. This is how I’ve been involved lately in CodraFT development, which was supported by CEA. It is available freely on GitHub: this is a Python-Qt based software aiming at processing signals and images and visualizing them. Its main upside is testability: the objective was to create a data processing software with a high level of robustness. Data processing features are mainly based on NumPy, SciPy and scikit-image.

Which Python libraries are your favorite (core or 3rd party)?

At the moment, I’m quite fond of scikit-image for image processing ; nice and clean API, and great documentation. OpenCV is also a great tool available to Python users and provides very efficient pattern detection algorithms for example.

What are some of the big lessons you learned while working on Spyder or WinPython?

I think that the most important lesson I’ve learned during those years is that we need to collaborate with other people. Otherwise, in the end, projects will at best remain as good ideas, or will be discontinued. With Spyder and WinPython, the thing that I’m the most proud of is that I managed to trust someone else to take over the projects and maintain them: in both cases, it was a good decision and projects are still active and popular.

Is there anything else you’d like to say?

I recently add the opportunity to attend a conference around Jupyter (PyData Paris). I really admire the work that has been done around the Jupyter ecosystem. From the IPython version I played with in 2008 to today’s JupyterLab, what an achievement from a technical point of view as well as in terms of community and project management!

Thanks for doing the interview, Pierre!

The post PyDev of the Week: Pierre Raybaut appeared first on Mouse Vs Python.

March 20, 2023 12:30 PM UTC


Django Weblog

Django 4.2 release candidate 1 released

Django 4.2 release candidate 1 is the final opportunity for you to try out the farrago of new features before Django 4.2 is released.

The release candidate stage marks the string freeze and the call for translators to submit translations. Provided no major bugs are discovered that can't be solved in the next two weeks, Django 4.2 will be released on or around April 3. Any delays will be communicated on the Django forum.

Please use this opportunity to help find and fix bugs (which should be reported to the issue tracker). You can grab a copy of the package from our downloads page or on PyPI.

The PGP key ID used for this release is Mariusz Felisiak: 2EF56372BA48CD1B.

March 20, 2023 07:33 AM UTC


Python GUIs

Working With Git and Github in Your Python Projects

Using a version control system (VCS) is crucial for any software development project. These systems allow developers to track changes to the project's codebase over time, removing the need to keep multiple copies of the project folder.

VCSs also facilitate experimenting with new features and ideas without breaking existing functionality in a given project. They also enable collaboration with other developers that can contribute code, documentation, and more.

In this article, we'll learn about Git, the most popular VCS out there. We'll learn everything we need to get started with this VCS and start creating our own repositories. We'll also learn how to publish those repositories to GitHub, another popular tool among developers nowadays.

Installing and Setting Up Git

To use Git in our coding projects, we first need to install it on our computer. To do this, we need to navigate to Git's download page and choose the appropriate installer for our operating system. Once we've downloaded the installer, we need to run it and follow the on-screen instructions.

We can check if everything is working correctly by opening a terminal or command-line window and running git --version.

Once we've confirmed the successful installation, we should provide Git with some personal information. You'll only need to do this once for every computer. Now go ahead and run the following commands with your own information:

shell
$ git config --global user.name <"YOUR NAME">
$ git config --global user.email <name@email.com>

The first command adds your full name to Git's config file. The second command adds your email. Git will use this information in all your repositories.

If you publish your projects to a remote server like GitHub, then your email address will be visible to anyone with access to that repository. If you don't want to expose your email address this way, then you should create a separate email address to use with Git.

As you'll learn in a moment, Git uses the concept of branches to manage its repositories. A branch is a copy of your project's folder at a given time in the development cycle. The default branch of new repositories is named either master or main, depending on your current version of Git.

You can change the name of the default branch by running the following command:

shell
$ git config --global init.defaultBranch <branch_name>

This command will set the name of Git's default branch to branch_name. Remember that this is just a placeholder name. You need to provide a suitable name for your installation.

Another useful setting is the default text editor Git will use to type in commit messages and other messages in your repo. For example, if you use an editor like Visual Studio Code, then you can configure Git to use it:

shell
# Visual Studio Code
$ git config --global core.editor "code --wait"

With this command, we tell Git to use VS Code to process commit messages and any other message we need to enter through Git.

Finally, to inspect the changes we've made to Git's configuration files, we can run the following command:

shell
$ git config --global -e

This command will open the global .gitconfig file in our default editor. There, we can fix any error we have made or add new settings. Then we just need to save the file and close it.

Understanding How Git Works

Git works by allowing us to take a snapshot of the current state of all the files in our project's folder. Each time we save one of those snapshots, we make a Git commit. Then the cycle starts again, and Git creates new snapshots, showing how our project looked like at any moment.

Git was created in 2005 by Linus Torvalds, the creator of the Linux kernel. Git is an open-source project that is licensed under the GNU General Public License (GPL) v2. It was initially made to facilitate kernel development due to the lack of a suitable alternative.

The general workflow for making a Git commit to saving different snapshots goes through the following steps:

  1. Change the content of our project's folder.
  2. Stage or mark the changes we want to save in our next commit.
  3. Commit or save the changes permanently in our project's Git database.

As the third step mentions, Git uses a special database called a repository. This database is kept inside your project's directory under a folder called .git.

Version-Controlling a Project With Git: The Basics

In this section, we'll create a local repository and learn how to manage it using the Git command-line interface (CLI). On macOS and Linux, we can use the default terminal application to follow along with this tutorial.

On Windows, we recommend using Git Bash, which is part of the Git For Windows package. Go to the Git Bash download page, get the installer, run it, and follow the on-screen instruction. Make sure to check the Additional Icons -> On the Desktop to get direct access to Git Bash on your desktop so that you can quickly find and launch the app.

Alternatively, you can also use either Windows' Command Prompt or PowerShell. However, some commands may differ from the commands used in this tutorial.

Initializing a Git Repository

To start version-controlling a project, we need to initialize a new Git repository in the project's root folder or directory. In this tutorial, we'll use a sample project to facilitate the explanation. Go ahead and create a new folder in your file system. Then navigate to that folder in your terminal by running these commands:

shell
$ mkdir sample_project
$ cd sample_project

The first command creates the project's root folder or directory, while the second command allows you to navigate into that folder. Don't close your terminal window. You'll be using it throughout the next sections.

To initialize a Git repository in this folder, we need to use the git init command like in the example below:

shell
$ git init
Initialized empty Git repository in /.../sample_project/.git/

This command creates a subfolder called .git inside the project's folder. The leading dot in the folder's name means that this is a hidden directory. So, you may not see anything on your file manager. You can check the existence of .git with the ls -a, which lists all files in a given folder, including the hidden ones.

Checking the Status of Our Project

Git provides the git status command to allow us to identify the current state of a Git repository. Because our sample_project folder is still empty, running git status will display something like this:

shell
$ git status
On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

When we run git status, we get detailed information about the current state of our Git repository. This command is pretty useful, and we'll turn back to it in multiple moments.

As an example of how useful the git status command is, go ahead and create a file called main.py inside the project's folder using the following commands:

shell
$ touch main.py

$ git status
On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    main.py

nothing added to commit but untracked files present (use "git add" to track)

With the touch command, we create a new main.py file under our project's folder. Then we run git status again. This time, we get information about the presence of an untracked file called main.py. We also get some basic instructions on how to add this file to our Git repo. Providing these guidelines or instructions is one of the neatest features of git status.

Now, what is all that about untracked files? In the following section, we'll learn more about this topic.

Tracking and Committing Changes

A file in a Git repository can be either tracked or untracked. Any file that wasn't present in the last commit is considered an untracked file. Git doesn't keep a history of changes for untracked files in your project's folder.

In our example, we haven't made any commits to our Git repo, so main.py is naturally untracked. To start tracking it, run the git add command as follows:

shell
$ git add main.py

$ git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
    new file:   main.py

This git add command has added main.py to the list of tracked files. Now it's time to save the file permanently using the git commit command with an appropriate commit message provided with the -m option:

shell
$ git commit -m "Add main.py"
[main (root-commit) 5ac6586] Add main.py
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 main.py

$ git status
On branch master
nothing to commit, working tree clean

We have successfully made our first commit, saving main.py to our Git repository. The git commit command requires a commit message, which we can provide through the -m option. Commit messages should clearly describe what we have changed in our project.

After the commit, our main branch is completely clean, as you can conclude from the git status output.

Now let's start the cycle again by modifying main.py, staging the changes, and creating a new commit. Go ahead and run the following commands:

shell
$ echo "print('Hello, World!')" > main.py
$ cat main.py
print('Hello, World!')

$ git add main.py

$ git commit -m "Create a 'Hello, World!' script  on  main.py"
[main 2f33f7e] Create a 'Hello, World!' script  on  main.py
 1 file changed, 1 insertion(+)

The echo command adds the statement "print('Hello, World!')" to our main.py file. You can confirm this addition with the cat command, which lists the content of one or more target files. You can also open main.py in your favorite editor and update the file there if you prefer.

We can also use the git stage command to stage or add files to a Git repository and include them in our next commit.

We've made two commits to our Git repo. We can list our commit history using the git log command as follows:

shell
$ git log --oneline
2f33f7e (HEAD -> main) Create a 'Hello, World!' script  on  main.py
5ac6586 Add main.py

The git log command allows us to list all our previous commits. In this example, we've used the --oneline option to list commits in a single line each. This command takes us to a dedicated output space. To leave that space, we can press the letter Q on our keyboard.

Using a .gitignore File to Skip Unneeded Files

While working with Git, we will often have files and folders that we must not save to our Git repo. For example, most Python projects include a venv/ folder with a virtual environment for that project. Go ahead and create one with the following command:

shell
$ python -m venv venv

Once we've added a Python virtual environment to our project's folder, we can run git status again to check the repo state:

shell
$ git status
On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
    venv/

nothing added to commit but untracked files present (use "git add" to track)

Now the venv/ folder appears as an untracked file in our Git repository. We don't need to keep track of this folder because it's not part of our project's codebase. It's only a tool for working on the project. So, we need to ignore this folder. To do that, we can add the folder to a .gitignore file.

Go ahead and create a .gitignore file in the project's folder. Add the venv/ folders to it and run git status:

shell
$ touch .gitignore
$ echo "venv/" > .gitignore
$ git status
On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
    .gitignore

nothing added to commit but untracked files present (use "git add" to track)

Now git status doesn't list venv/ as an untracked file. This means that Git is ignoring that folder. If we take a look at the output, then we'll see that .gitignore is now listed as an untracked file. We must commit our .gitignore files to the Git repository. This will prevent other developers working with us from having to create their own local .gitignore files.

We can also list multiple files and folders in our .gitignore file one per line. The file even accepts glob patterns to match specific types of files, such as *.txt. If you want to save yourself some work, then you can take advantage of GitHub's gitignore repository, which provides a rich list of predefined .gitignore files for different programming languages and development environments.

We can also set up a global .gitignore file on our computer. This global file will apply to all our Git repositories. If you decide to use this option, then go ahead and create a .gitignore_global in your home folder.

Working With Branches in Git

One of the most powerful features of Git is that it allows us to create multiple branches. A branch is a copy of our project's current status and commits history. Having the option to create and handle branches allows us to make changes to our project without messing up the main line of development.

We'll often find that software projects maintain several independent branches to facilitate the development process. A common branch model distinguishes between four different types of branches:

  1. A main or master branch that holds the main line of development
  2. A develop branch that holds the last developments
  3. One or more feature branches that hold changes intended to add new features
  4. One or more bugfix branches that hold changes intended to fix critical bugs

However, the branching model to use is up to you. In the following sections, we'll learn how to manage branches using Git.

Creating New Branches

Working all the time on the main or master branch isn't a good idea. We can end up creating a mess and breaking the code. So, whenever we want to experiment with a new idea, implement a new feature, fix a bug, or just refactor a piece of code, we should create a new branch.

To kick things off, let's create a new branch called hello on our Git repo under the sample_project folder. To do that, we can use the git branch command followed by the branch's name:

shell
$ git branch hello
$ git branch --list
* main
  hello

The first command creates a new branch in our Git repo. The second command allows us to list all the branches that currently exist in our repository. Again, we can press the letter Q on our keyboard to get back to the terminal prompt.

The star symbol denotes the currently active branch, which is main in the example. We want to work on hello, so we need to activate that branch. In Git's terminology, we need to check out to hello.

Checking Out to a New Branch

Although we have just created a new branch, in order to start working on it, we need to switch to or check out to it by using the git checkout command as follows:

shell
$ git checkout hello
Switched to branch 'hello'

$ git branch --list
  main
* hello

$ git log --oneline
2f33f7e (HEAD -> hello, main) Create a 'Hello, World!' script  on  main.py
5ac6586 Add main.py

The git checkout command takes the name of an existing branch as an argument. Once we run the command, Git takes us to the target branch.

We can derive a new branch from whatever branch we need.

As you can see, git branch --list indicates which branch we are currently on by placing a * symbol in front of the relevant branch name. If we check the commit history with git log --oneline, then we'll get the same as we get from main because hello is a copy of it.

The git checkout can take a -b flag that we can use to create a new branch and immediately check out to it in a single step. That's what most developers use while working with Git repositories. In our example, we could have run git checkout -b hello to create the hello branch and check out to it with one command.

Let's make some changes to our project and create another commit. Go ahead and run the following commands:

shell
$ echo "print('Welcome to PythonGUIs!')" >> main.py
$ cat main.py
print('Hello, World!')
print('Welcome to PythonGUIs!')

$ git add main.py
$ git commit -m "Extend our 'Hello, World' program with a welcome message."
[hello be62476] Extend our 'Hello, World' program with a welcome message.
 1 file changed, 1 insertion(+)

The final command committed our changes to the hello branch. If we compare the commit history of both branches, then we'll see the difference:

shell
$ git log --oneline -1
be62476 (HEAD -> hello) Extend our 'Hello, World' program with a welcome message.

$ git checkout main
Switched to branch 'main'

$ git log --oneline -1
2f33f7e (HEAD -> main) Create a 'Hello, World!' script  on  main.py

In this example, we first run git log --oneline with -1 as an argument. This argument tells Git to give us only the last commit in the active branch's commit history. To inspect the commit history of main, we first need to check out to that branch. Then we can run the same git log command.

Now say that we're happy with the changes we've made to our project in the hello branch, and we want to update main with those changes. How can we do this? We need to merge hello into main.

Merging Two Branches Together

To add the commits we've made in a separate branch back to another branch, we can run what is known as a merge. For example, say we want to merge the new commits in hello into main. In this case, we first need to switch back to main and then run the git merge command using hello as an argument:

shell
$ git checkout main
Already on 'main'

$ git merge hello
Updating 2f33f7e..be62476
Fast-forward
 main.py | 1 +
 1 file changed, 1 insertion(+)

To merge a branch into another branch, we first need to check out the branch we want to update. Then we can run git merge. In the example above, we first check out to main. Once there, we can merge hello.

Deleting Unused Branches

Once we've finished working in a given branch, we can delete the entire branch to keep our repo as clean as possible. Following our example, now that we've merged hello into main, we can remove hello.

To remove a branch from a Git repo, we use the git branch command with the --delete option. To successfully run this command, make sure to switch to another branch before:

shell
$ git checkout main
Already on 'main'

$ git branch --delete hello
Deleted branch hello (was be62476).

$ git branch --list
* main

Deleting unused branches is a good way to keep our Git repositories clean, organized, and up to date. Also, deleting them as soon as we finish the work is even better because having old branches around may be confusing for other developers collaborating with our project. They might end up wondering why these branches are still alive.

Using a GUI Client for Git

In the previous sections, we've learned to use the git command-line tool to manage Git repositories. If you prefer to use GUI tools, then you'll find a bunch of third-party GUI frontends for Git. While they won't completely replace the need for using the command-line tool, they can simplify your day-to-day workflow.

You can get a complete list of standalone GUI clients available on the Git official documentation.

Most popular IDEs and code editors, including PyCharm and Visual Studio Code, come with basic Git integration out-of-the-box. Some developers will prefer this approach as it is directly integrated with their development environment of choice.

If you need something more advanced, then GitKraken is probably a good choice. This tool provides a standalone, cross-platform GUI client for Git that comes with many additional features that can boost your productivity.

Managing a Project With GitHub

If we publish a project on a remote server with support for Git repositories, then anyone with appropriate permissions can clone our project, creating a local copy on their computer. Then, they can make changes to our project, commit them to their local copy, and finally push the changes back to the remote server. This workflow provides a straightforward way to allow other developers to contribute code to your projects.

In the following sections, we'll learn how to create a remote repository on GitHub and then push our existing local repository to it. Before we do that, though, head over to GitHub.com and create an account there if you don't have one yet. Once you have a GitHub account, you can set up the connection to that account so that you can use it with Git.

Setting Up a Secure Connection to GitHub

In order to work with GitHub via the git command, we need to be able to authenticate ourselves. There are a few ways of doing that. However, using SSH is the recommended way. The first step in the process is to generate an SSH key, which you can do with the following command:

shell
$ ssh-keygen -t ed25519 -C "GitHub - name@email.com"

Replace the placeholder email address with the address you've associated with your GitHub account. Once you run this command, you'll get three different prompts in a row. You can respond to them by pressing Enter to accept the default option. Alternatively, you can provide custom responses.

Next, we need to copy the contents of our id_ed25519.pub file. To do this, you can run the following command:

shell
$ cat ~/.ssh/id_ed25519.pub

Select the command's output and copy it. Then go to your GitHub Settings page and click the SSH and GPG keys option. There, select New SSH key, set a descriptive title for the key, make sure that the Key Type is set to Authentication Key, and finally, paste the contents of id_ed25519.pub in the Key field. Finally, click the Add SSH key button.

At this point, you may be asked to provide some kind of Two-Factor Authentication (2FA) code. So, be ready for that extra security step.

Now we can test our connection by running the following command:

shell
$ ssh -T git@github.com
The authenticity of host 'github.com (IP ADDRESS)' can not be established.
ECDSA key fingerprint is SHA256:p2QAMXNIC1TJYWeIOttrVc98/R1BUFWu3/LiyKgUfQM.
Are you sure you want to continue connecting (yes/no/[fingerprint])?

Make sure to check whether the key fingerprint shown on your output matches GitHub's public key fingerprint. If it matches, then enter yes and press Enter to connect. Otherwise, don't connect.

If the connection is successful, we will get a message like this:

shell
Hi USERNAME! You have successfully authenticated, but GitHub does not provide shell access.

Congrats! You've successfully connected to GitHub via SSH using a secure SSH key. Now it's time to start working with GitHub.

Creating and Setting Up a GitHub Repository

Now that you have a GitHub account with a proper SSH connection, let's create a remote repository on GitHub using its web interface. Head over to the GitHub page and click the + icon next to your avatar in the top-right corner. Then select New repository.

Give your new repo a unique name and choose who can see this repository. To continue with our example, we can give this repository the same name as our local project, sample_project.

To avoid conflicts with your existing local repository, don't add .gitignore, README, or LICENSE files to your remote repository.

Next, set the repo's visibility to Private so that no one else can access the code. Finally, click the Create repository button at the end of the page.

If you create a Public repository, make sure also to choose an open-source license for your project to tell people what they can and can't do with your code.

You'll get a Quick setup page as your remote repository has no content yet. Right at the top, you'll have the choice to connect this repository via HTTPS or SSH. Copy the SSH link and run the following command to tell Git where the remote repository is hosted:

shell
$ git remote add origin git@github.com:USERNAME/sample_project.git

This command adds a new remote repository called origin to our local Git repo.

The name origin is commonly used to denote the main remote repository associated with a given project. This is the default name Git uses to identify the main remote repo.

Git allows us to add several remote repositories to a single local one using the git remote add command. This allows us to have several remote copies of your local Git repo.

Pushing a Local Git Repository to GitHub

With a new and empty GitHub repository in place, we can go ahead and push the content of our local repo to its remote copy. To do this, we use the git push command providing the target remote repo and the local branch as arguments:

shell
$ git push -u origin main
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 8 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (9/9), 790 bytes | 790.00 KiB/s, done.
Total 9 (delta 0), reused 0 (delta 0), pack-reused 0
To github.com:USERNAME/sample_project.git
 * [new branch]      main -> main
branch 'main' set up to track 'origin/main'.

This is the first time we push something to the remote repo sample_project, so we use the -u option to tell Git that we want to set the local main branch to track the remote main branch. The command's output provides a pretty detailed summary of the process.

Note that if you don't add the -u option, then Git will ask what you want to do. A safe workaround is to copy and paste the commands GitHub suggests, so that you don't forget -u.

Using the same command, we can push any local branch to any remote copy of our project's repo. Just change the repo and branch name: git push -u remote_name branch_name.

Now let's head over to our browser and refresh the GitHub page. We will see all of our project files and commit history there.

Now we can continue developing our project and making new commits locally. To push our commits to the remote main branch, we just need to run git push. This time, we don't have to use the remote or branch name because we've already set main to track origin/main.

Pulling Content From a GitHub Repository

We can do basic file editing and make commits within GitHub itself. For example, if we click the main.py file and then click the pencil icon at the top of the file, we can add another line of code and commit those changes to the remote main branch directly on GitHub.

Go ahead and add the statement print("Your Git Tutorial is Here...") to the end of main.py. Then go to the end of the page and click the Commit changes button. This makes a new commit on your remote repository.

This remote commit won't appear in your local commit history. To download it and update your local main branch, use the git pull command:

shell
$ git pull
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), 696 bytes | 174.00 KiB/s, done.
From github.com:USERNAME/sample_project
   be62476..605b6a7  main       -> origin/main
Updating be62476..605b6a7
Fast-forward
 main.py | 1 +
 1 file changed, 1 insertion(+)

Again, the command's output provides all the details about the operation. Note that git pull will download the remote branch and update the local branch in a single step.

If we want to download the remote branch without updating the local one, then we can use the [git fetch](https://git-scm.com/docs/git-fetch) command. This practice gives us the chance to review the changes and commit them to our local repo only if they look right.

For example, go ahead and update the remote copy of main.py by adding another statement like print("Let's go!!"). Commit the changes. Then get back to your local repo and run the following command:

shell
$ git fetch
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), 731 bytes | 243.00 KiB/s, done.
From github.com:USERNAME/sample_project
   605b6a7..ba489df  main       -> origin/main

This command downloaded the latest changes from origin/main to our local repo. Now we can compare the remote copy of main.py to the local copy. To do this, we can use the git diff command as follows:

shell
$ git diff main origin/main
diff --git a/main.py b/main.py
index be2aa66..4f0e7cf 100644
--- a/main.py
+++ b/main.py
@@ -1,3 +1,4 @@
 print('Hello, World!')
 print('Welcome to PythonGUIs!')
 print("Your Git Tutorial is Here...")
+print("Let's go!!")

In the command's output, you can see that the remote branch adds a line containing print("Let's go!!") to the end of main.py. This change looks good, so we can use git pull to commit the change automatically.

Exploring Alternatives to GitHub

While GitHub is the most popular public Git server and collaboration platform in use, it is far from being the only one. GitLab.com and BitBucket are popular commercial alternatives similar to GitHub. While they have paid plans, both offer free plans, with some restrictions, for individual users.

Although, if you would like to use a completely open-source platform instead, Codeberg might be a good option. It's a community-driven alternative with a focus on supporting Free Software. Therefore, in order to use Codeberg, your project needs to use a compatible open-source license.

Optionally, you can also run your own Git server. While you could just use barebones git for this, software such as GitLab Community Edition (CE) and Forgejo provide you with both the benefits of running your own server and the experience of using a service like GitHub.

Conclusion

By now, you're able to use Git for version-controlling your projects. Git is a powerful tool that will make you much more efficient and productive, especially as the scale of your project grows over time.

While this guide introduced you to most of its basic concepts and common commands, Git has many more commands and options that you can use to be even more productive. Now, you know enough to get up to speed with Git.

March 20, 2023 06:00 AM UTC


Armin Ronacher

Lessons from a Pessimist: Make Your Pessimism Productive

This year I decided that I want to share my most important learnings about engineering, teams and quite frankly personal mental health. My hope is that those who want to learn from me find it useful.

I consider myself a functional and pragmatic pessimist. I tend to err on the side of anticipating the worst outcome most of the time. This mindset often leads me to assume that things are more difficult than they actually are, but it also highlights potential pitfalls along the way. In some ways, this is a coping mechanism, but it also aids in problem-solving and sets my expectations low, frequently resulting in pleasant surprises.

However, in recent years, I've more and more encountered a different kind of pessimism in others that I deem destructive. This type of pessimism sees no good in the world and renders people feeling powerless. I thought it might be worthwhile to share why I am not entirely consumed by gloom.

Destructive pessimism involves either wanting or expecting things to fail. At first glance, the aspect of not expecting success may appear similar to how I operate, but there's a subtle distinction. I generally anticipate that things will be challenging but still achievable, and when it matters, I want them to succeed. An extreme example of destructive pessimism on the other hand is expecting climate change to end the world and assuming society will do nothing to prevent it.

Whatever I personally do, I want it to be successful. I don't search for reasons why something won't work; instead, I focus on how to make it work while addressing or avoiding the issues I see along the way. That does not make me an optimist, that just makes me someone who wants to get stuff done and someone who strives for positive outcomes. On the other hand optimism to me is expecting to succeed against all odds, something I do not do. I fully expect that there will be failure along the way. (I also love venting about stuff I don't like even if it's not at all productive).

Many individuals in today's economy worry about their retirement and harbor a general negative sentiment about nearly everything, from the unfairness of the labor market and increasing poverty to climate change and more. Believe it or not, I share much of this negative sentiment, but I've learned never to let such thoughts govern my life. Dwelling on negativity regarding your employer, job prospects, government, economy, or environment — especially when it's difficult to influence these aspects — leads to nothing but unhappiness and depression.

Our times are marked by a number of transformative events. A recent conversation about AI I had with some folks I think is quite illustrative about how you can be a pessimist yet still be excited and forward looking. What's happening with AI at the moment makes a lot of people deeply uncomfortable. On the one hand some think that their job is at risk, others are trying to fight that future out of fear by attacking the foundations of it from all kinds of different angles. This fight comes from copyright law, various moral aspects as well as downplaying the status-quo capabilities of AI. All of these things are absolutely worth considering! You might remember from a recent blog post about AI that I myself posted something here that outlines some of the potential issues with AI. Nevertheless, AI will continue to advance, and being afraid of it is simply unproductive. Rather than becoming despondent about AI, my pessimistic side assumes that things can go wrong and acts accordingly, all while giving the technology a fair chance.

I am absolutely convinced that it's important to recognize the difference between a pragmatic form of pessimism and destructive pessimism. And as cheesy as it sounds, try to surround yourself with supportive individuals who can help you maintain a positive outlook and try to be that person for others. You don't have to be an optimist for wanting to succeed!

March 20, 2023 12:00 AM UTC

March 19, 2023


ListenData

ChatGPT-4 Is a Smart Analyst, Unlike GPT-3.5

ChatGPT has been trending on social media platforms. It has crossed one million users in just a week time. Those who haven't heard about ChatGPT, it's a large language model trained by OpenAI. In simple words, it's a chat bot which answers your questions and the responses it provides may sound human-like. It's an impressive machine learning solution. With the release of GPT-4 we can rely on it over Google search for learning on any topic.

Update: I updated this article with reviews on GPT-4. ChatGPT-4 vs ChatGPT-3.5
Why ChatGPT-3.5 Isn't Smart enough, but GPT-4 is

You can't trust ChatGPT-3.5 for preparation on any certification or exam. It's a Big NO if you think you can refer ChatGPT-3.5 for answering questions in a telephonic interview round. Yes I know it's a cheating if you even use Google for the same but wanted to give a WARNING as many people do this and many social media influencers posted on how to leverage ChatGPT-3.5 for cracking interviews. After the release of GPT-4, I can confidently say that it will be a useful resource for exam preparation. GPT-4 has huge improvement over GPT-3.5. It can stun you with its ability to create human-like response with a very high precision on facts and creativity.

READ MORE »

March 19, 2023 01:00 AM UTC


Brian Okken

Sharing is Caring - Sharing pytest Fixtures - PyCascades 2023

Slides and code and such for a talk for PyCascades 2023. Talk page: Sharing is Caring - Sharing pytest Fixtures Scheduled Time: Sunday, March 19, 11:30 am Summary: pytest rocks, obviously. When people start using pytest as a team, they often come up with cool fixtures that would be great to share across projects. In fact, many great Python packages come pre-loaded with pytest fixtures.

March 19, 2023 12:00 AM UTC

March 18, 2023


Glyph Lefkowitz

Building And Distributing A macOS Application Written in Python

Why Bother With All This?

In other words: if you want to run on an Apple platform, why not just write everything in an Apple programming language, like Swift? If you need to ship to multiple platforms, you might have to rewrite it all anyway, so why not give up?

Despite the significant investment that platform vendors make in their tools, I fundamentally believe that the core logic in any software application ought to be where its most important value lies. For small, independent developers, having portable logic that can be faithfully replicated on every platform without massive rework might be tricky to get started with, but if you can’t do it, it may not be cost effective to support multiple platforms at all.

So, it makes sense for me to write my applications in Python to achieve this sort of portability, even though on each platform it’s going to be a little bit more of a hassle to get it all built and shipped since the default tools don’t account for the use of Python.

But how much more is “a little bit” more of a hassle? I’ve been slowly learning about the pipeline to ship independently-distributed1 macOS applications for the last few years, and I’ve encountered a ton of annoying roadblocks.

Didn’t You Do This Already? What’s New?

So nice of you to remember. Thanks for asking. While I’ve gotten this to mostly work in the past, some things have changed since then:

I’ve also recently shipped my first build of an end-user application that successfully launches on both Apple Silicon and Intel macs, so here is a brief summary of the hoops I needed to jump through, from the beginning, in order to make everything work.

Wait did you say you wrote a tool? Is this fixed, then?

Encrust is, I hope, a temporary stopgap on the way to a much better comprehensive solution.

Specifically, I believe that Briefcase is a much more holistic solution to the general problem being described here, but it doesn’t suit my very specific needs right now4, and it doesn’t address a couple of minor points that I was running into here.

It is mostly glue that is shelling out to other tools that already solve portions of the problem, even when better APIs exist. It addresses three very specific layers of complexity:

  1. It enforces architecture independence, so that your app built on an M1 machine will still actually run on about half of the macs remaining out there2.
  2. It remembers tricky nuances of the notarization submission process, such as the highly specific way I need to generate my zip files to avoid mysterious notarization rejections3.
  3. Providing a common and central way to store the configuration for these things across repositories so I don’t need to repeat this process and copy/paste a shell script every time I make a tiny new application.

It only works on Apple Silicon macs, because I didn’t bother to figure out how pip actually determines which architecture to download wheels for.

As such, unfortunately, Encrust is mostly a place for other people who have already solved this problem to collaborate to centralize this sort of knowledge and share ideas about where this code should ultimately go, rather than a tool for users trying to get started with shipping an app.

Open Offer

That said:

  1. I want to help Python developers ship their Python apps to users who are not also Python developers.
  2. macOS is an even trickier platform to do that on than most.
  3. It’s now easy for me to sign, notarize, and release new applications reliably

Therefore:

If you have an open source Python application that runs on macOS5 but can’t ship to macOS — either because:

  1. you’ve gotten stuck on one of the roadblocks that this post describes,
  2. you don’t have $100 to give to Apple, or because
  3. the app is using a cross-platform toolkit that should work just fine and you don’t have access to a mac at all, then

Send me an email and I’ll sign and post your releases.

What’s this post about, then?

People still frequently complain that “Python packaging” is really bad. And I’m on record that packaging Python (in the sense of “code”) for Python (in the sense of “deployment platform”) is actually kind of fine right now; if what you’re trying to get to is a package that can be pip installed, you can have a reasonably good experience modulo a few small onboarding hiccups that are well-understood in the community and fairly easy to overcome.

However, it’s still unfortunately hard to get Python code into the hands of users who are not also Python programmers with their own development environments.

My goal here is to document the difficulties themselves to try to provide a snapshot of what happens if you try to get started from scratch today. I think it is useful to record all the snags and inscrutable error messages that you will hit in a row, so we can see what the experience really feels like.

I hope that everyone will find it entertaining.

but the main audience is the maintainers of tools like Briefcase and py2app to evaluate the new-user experience holistically, and to see how much the use of their tools feels like this. This necessarily includes the parts of the process that are not actually packaging.

This is why I’m starting from the beginning again, and going through all the stuff that I’ve discussed in previous posts again, to present the whole experience.

Here Goes

So, with no further ado, here is a non-exhaustive list of frustrations that I have encountered in this process:

At this point I have an app bundle which kinda works. But in order to run on anyone else’s computer, I have to code-sign it.

Now my app bundle is signed! Hooray. 12 years ago, I’d be all set. But today I need some additional steps.

Hooray! Time to release my great app!

And we have an application bundle we can ship to users.

It’s just that easy.

As long as I don’t need sandboxing or Mac App Store distribution, of course. That’s a challenge for another day.


So, that was terrible. But what should be happening here?

Some of this is impossible to simplify beyond a certain point - many of the things above are not really about Python, but are about distribution requirements for macOS specifically, and we in the Python community can’t affect operating system vendors’ tooling.

What we can do is build tools that produce clear guidance on what step is required next, handle edge cases on their own, and generally guide users through these complex processes without requiring them to hit weird binary-format or cryptographic-signing errors on their own with no explanation of what to do next.

I do not think that documentation is the answer here. The necessary steps should be discoverable. If you need to go to a website, the tool should use the webbrowser module to open a website. If you need to launch an app, the tool should launch that app.

With Encrust, I am hoping to generalize the solutions that I found while working on this for this one specific slice of the app distribution pipeline — i.e. a macOS desktop application desktop, as distributed independently and not through the mac app store — but other platforms will need the same treatment.

However, even without really changing py2app or any of the existing tooling, we could imagine a tool that would interactively prompt the user for each manual step, automate as much of it as possible, verify that it was performed correctly, and give comprehensible error messages if it was not.

For a lot of users, this full code-signing journey may not be necessary; if you just want to run your code on one or two friends’ computers, telling them to right click, go to ‘open’ and enter their password is not too bad. But it may not even be clear to them what the trade-off is, exactly; it looks like the app is just broken when you download it. The app build pipeline should tell you what the limitations are.

Other parts of this just need bug-fixes to address. py2app specifically, for example, could have a better self-test for its module-collecting behavior, launching an app to make sure it didn’t leave anything out.

Interactive prompts to set up a Homebrew tap, or a Flatpak build, or a Microsoft Store Metro app, might be similarly useful. These all have outside-of-Python required manual steps, and all of them are also amenable to at least partial automation.


Thanks to my patrons on Patreon for supporting this sort of work, including development of Encrust, of Pomodouroboros, of posts like this one and of that offer to sign other people’s apps. If you think this sort of stuff is worthwhile, you might want to consider supporting me over there as well.


  1. I am not even going to try to describe building a sandboxed, app-store ready application yet. 

  2. At least according to the Steam Hardware Survey, which as of this writing in March of 2023 pegs the current user-base at 54% apple silicon and 46% Intel. The last version I can convince the Internet Archive to give me, from December of 2022, has it closer to 51%/49%, which suggests a transition rate of 1% per month. I suspect that this is pretty generous to Apple Silicon as Steam users would tend to be earlier adopters and more sensitive to performance, but mostly I just don’t have any other source of data. 

  3. It is truly remarkable how bad the error reporting from the notarization service is. There are dozens of articles and forum posts around the web like this one where someone independently discovers this failure mode after successfully notarizing a dozen or so binaries and then suddenly being unable to do so any more because one of the bytes in the signature is suddenly not valid UTF-8 or something. 

  4. A lot of this is probably historical baggage; I started with py2app in 2008 or so, and I have been working on these apps in fits and starts for… ugh… 15 years. At some point when things are humming along and there are actual users, a more comprehensive retrofit of the build process might make sense but right now I just want to stop thinking about this

  5. If your application isn’t open source, or if it requires some porting work, I’m also available for light contract work, but it might take a while to get on my schedule. Feel free to reach out as well, but I am not looking to spend a lot of time doing porting work. 

  6. I find this particular detail interesting; it speaks to the complexity and depth of this problem space that this has been a known issue for several years in Briefcase, but there’s just so much other stuff to handle in the release pipeline that it remains open. 

  7. I forgot both .a files and the py2app-included python executable itself here, and had to discover that gap when I signed a different app where that made a difference. 

  8. Thus far, it seems to be working. 

March 18, 2023 09:45 PM UTC


Hynek Schlawack

How to Automatically Switch to Rosetta With Fish and Direnv

I love my Apple silicon computer, but having to manually switch to Rosetta-enabled shells for my Intel-only projects was a bummer.

March 18, 2023 12:00 PM UTC


Talk Python to Me

#407: pytest tips and tricks for better testing

If you're like most people, the simplicity and easy of getting started is a big part of pytest's appeal. But beneath that simplicity, there is a lot of power and depth. We have Brian Okken on this episode to dive into his latest pytest tips and tricks for beginners and power users.<br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>pytest tips and tricks article</b>: <a href="https://pythontest.com/pytest-tips-tricks/" target="_blank" rel="noopener">pythontest.com</a><br/> <b>Getting started with pytest Course</b>: <a href="https://training.talkpython.fm/courses/getting-started-with-testing-in-python-using-pytest" target="_blank" rel="noopener">training.talkpython.fm</a><br/> <b>pytest book</b>: <a href="https://pythontest.com/pytest-book/" target="_blank" rel="noopener">pythontest.com</a><br/> <b>Python Bytes podcast</b>: <a href="https://pythonbytes.fm" target="_blank" rel="noopener">pythonbytes.fm</a><br/> <b>Brian on Mastodon</b>: <a href="https://fosstodon.org/@brianokken" target="_blank" rel="noopener">@brianokken@fosstodon.org</a><br/> <br/> <b>Hypothesis</b>: <a href="https://hypothesis.readthedocs.io/en/latest/" target="_blank" rel="noopener">readthedocs.io</a><br/> <b>Hypothesis: Reproducability</b>: <a href="https://hypothesis.readthedocs.io/en/latest/reproducing.html" target="_blank" rel="noopener">readthedocs.io</a><br/> <b>Get More Done with the DRY Principle</b>: <a href="https://zapier.com/blog/dont-repeat-yourself/" target="_blank" rel="noopener">zapier.com</a><br/> <b>"The Key" Keyboard</b>: <a href="https://stackoverflow.blog/2021/03/31/the-key-copy-paste/" target="_blank" rel="noopener">stackoverflow.blog</a><br/> <b>pytest plugins</b>: <a href="https://docs.pytest.org/en/7.1.x/reference/plugin_list.html" target="_blank" rel="noopener">docs.pytest.org</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=qQ6b7OwT124" target="_blank" rel="noopener">youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/407/pytest-tips-and-tricks-for-better-testing" target="_blank" rel="noopener">talkpython.fm</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div><br/> <strong>Sponsors</strong><br/> <a href='https://talkpython.fm/foundershub'>Microsoft Founders Hub 2023</a><br> <a href='https://talkpython.fm/brilliant'>Brilliant 2023</a><br> <a href='https://talkpython.fm/training'>Talk Python Training</a>

March 18, 2023 08:00 AM UTC


Matt Layman

Locking Down Your Users' Secrets: Django Sessions 101

Django is a powerful and popular web framework that makes it easy to build robust and secure web applications. One of the key features of Django is its ability to manage user sessions, which are essential for many web applications. However, you may be wondering if Django sessions are secure. In this article, we’ll explore the security of Django sessions and see how they can be made even more secure.

March 18, 2023 12:00 AM UTC

March 17, 2023


Stack Abuse

DBSCAN with Scikit-Learn in Python

Introduction

You are working in a consulting company as a data scientis. The project you were currently assigned to has data from students who have recently finished courses about finances. The financial company that conducts the courses wants to understand if there are common factors that influence students to purchase the same courses or to purchase different courses. By understanding those factors, the company can create a student profile, classify each student by profile and recommend a list of courses.

When inspecting data from different student groups, you've come across three dispositions of points, as in 1, 2 and 3 below:

Notice that in plot 1, there are purple points organized in a half circle, with a mass of pink points inside that circle, a little concentration of orange points outside of that semi-circle, and five gray points that are far from all others.

In plot 2, there is a round mass of purple points, another of orange points, and also four gray points that are far from all the others.

And in plot 3, we can see four concentrations of points, purple, blue, orange, pink, and three more distant gray points.

Now, if you were to choose a model that could understand new student data and determine similar groups, is there a clustering algorithm that could give interesting results to that kind of data?

When describing the plots, we mentioned terms like mass of points and concentration of points, indicating that there are areas in all graphs with greater density. We also referred to circular and semi-circular shapes, which are difficult to identify by drawing a straight line or merely examining the closest points. Additionally, there are some distant points that likely deviate from the main data distribution, introducing more challenges or noise when determining the groups.

A density-based algorithm that can filter out noise, such as DBSCAN (Density-Based Spatial Clustering of Applications with Noise), is a strong choice for situations with denser areas, rounded shapes, and noise.

About DBSCAN

DBSCAN is one of the most cited algorithms in research, it's first publication appears in 1996, this is the original DBSCAN paper. In the paper, researchers demonstrate how the algorithm can identify non-linear spatial clusters and handle data with higher dimensions efficiently.

The main idea behind DBSCAN is that there is a minimum number of points that will be within a determined distance or radius from the most "central" cluster point, called core point. The points within that radius are the neighborhood points, and the points on the edge of that neighborhood are the border points or boundary points. The radius or neighborhood distance is called epsilon neighborhood, ε-neighborhood or just ε (the symbol for Greek letter epsilon).

Additionally, when there are points that aren't core points or border points because they exceed the radius for belonging to a determined cluster and also don't have the minimum number of points to be a core point, they are considered noise points.

This means we have three different types of points, namely, core, border and noise. Furthermore, it is important to note that the main idea is fundamentally based on a radius or distance, which makes DBSCAN - like most clustering models - dependent on that distance metric. This metric could be Euclidean, Manhattan, Mahalanobis, and many more. Therefore, it is crucial to choose an appropriate distance metric that considers the context of the data. For instance, if you are using driving distance data from a GPS, it might be interesting to use a metric that takes the street layouts into consideration, such as Manhattan distance.

Note: Since DBSCAN maps the points that constitute noise, it can also be used as an outlier detection algorithm. For instance, if you are trying to determine which bank transactions may be fraudulent and the rate of fraudulent transactions is small, DBSCAN might be a solution to identify those points.

To find the core point, DBSCAN will first select a point at random, map all the points within its ε-neighborhood, and compare the number of neighbors of the selected point to the minimum number of points. If the selected point has an equal number or more neighbors than the minimum number of points, it will be marked as a core point. This core point and its neighborhood points will constitute the first cluster.

The algorithm will then examine each point of the first cluster and see if it has an equal number or more neighbor points than the minimum number of points within ε. If it does, those neighbor points will also be added to the first cluster. This process will continue until the points of the first cluster have fewer neighbors than the minimum number of points within ε. When that happens, the algorithm stops adding points to that cluster, identifies another core point outside of that cluster, and creates a new cluster for that new core point.

DBSCAN will then repeat the first cluster process of finding all points connected to a new core point of the second cluster until there are no more points to be added to that cluster. It will then encounter another core point and create a third cluster, or it will iterate through all the points that it hasn't previously looked at. If these points are at ε distance from a cluster, they are added to that cluster, becoming border points. If they aren't, they are considered noise points.

Advice: There are many rules and mathematical demonstrations involved in the idea behind DBSCAN, if you want to dig deeper, you may want to take a look at the original paper, which is linked above.

It is interesting to know how the DBSCAN algorithm works, although, fortunately, there is no need to code the algorithm, once Python's Scikit-Learn library already has an implementation.

Let's see how it works in practice!

Importing Data for Clustering

To see how DBSCAN works in practice, we will change projects a bit and use a small customer dataset that has the genre, age, annual income, and spending score of 200 customers.

The spending score ranges from 0 to 100 and represents how often a person spends money in a mall on a scale from 1 to 100. In other words, if a customer has a score of 0, they never spend money, and if the score is 100, they are the highest spender.

Note: You can download the dataset here.

After downloading the dataset, you will see that it is a CSV (comma-separated values) file called shopping-data.csv, we'll load it into a DataFrame using Pandas and store it into the customer_data variable:

import pandas as pd

# Substitute the path_to_file content by the path to your csv file 
path_to_file = '../../datasets/dbscan/dbscan-with-python-and-scikit-learn-shopping-data.csv'
customer_data = pd.read_csv(path_to_file)

To take a look at the first five rows of our data, you can execute customer_data.head():

This results in:

    CustomerID  Genre   Age     Annual Income (k$)  Spending Score (1-100)
0   1           Male    19      15                  39
1   2           Male    21      15                  81
2   3           Female  20      16                  6
3   4           Female  23      16                  77
4   5           Female  31      17                  40

By examining the data, we can see customer ID numbers, genre, age, incomes in k$, and spending scores. Keep in mind that some or all of these variables will be used in the model. For example, if we were to use Age and Spending Score (1-100) as variables for DBSCAN, which uses a distance metric, it's important to bring them to a common scale to avoid introducing distortions since Age is measured in years and Spending Score (1-100) has a limited range from 0 to 100. This means that we will perform some kind of data scaling.

We can also check if the data needs any more preprocessing aside from scaling by seeing if the type of data is consistent and verifying if there are any missing values that need to be treated by executing Panda's info() method:

customer_data.info()

This displays:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   CustomerID              200 non-null    int64 
 1   Genre                   200 non-null    object
 2   Age                     200 non-null    int64 
 3   Annual Income (k$)      200 non-null    int64 
 4   Spending Score (1-100)  200 non-null    int64 
dtypes: int64(4), object(1)
memory usage: 7.9+ KB

We can observe that there are no missing values because there are 200 non-null entries for each customer feature. We can also see that only the genre column has text content, as it is a categorical variable, which is displayed as object, and all other features are numeric, of the type int64. Thus, in terms of data type consistency and absence of null values, our data is ready for further analysis.

We can proceed to visualize the data and determine which features would be interesting to use in DBSCAN. After selecting those features, we can scale them.

This customer dataset is the same as the one used in our definitive guide to hierarchical clustering. To learn more about this data, how to explore it, and about distance metrics, you can take a look at Definitive Guide to Hierarchical Clustering with Python and Scikit-Learn!

Visualizing Data

By using Seaborn's pairplot(), we can plot a scatter graph for each combination of features. Since CustomerID is just an identification and not a feature, we will remove it with drop() prior to plotting:

import seaborn as sns

# dropping CustomerID column from data 
customer_data = customer_data.drop('CustomerID', axis=1)

sns.pairplot(customer_data);

This outputs:

When looking at the combination of features produced by pairplot, the graph of Annual Income (k$) with Spending Score (1-100) seems to display around 5 groups of points. This seems to be the most promising combination of features. We can create a list with their names, select them from the customer_data DataFrame, and store the selection in the customer_data variable again for use in our future model.

selected_cols = ['Annual Income (k$)', 'Spending Score (1-100)']
customer_data = customer_data[selected_cols]

After selecting the columns, we can perform the scaling discussed in the previous section. To bring the features to the same scale or standardize them, we can import Scikit-Learn's StandardScaler, create it, fit our data to calculate its mean and standard deviation, and transform the data by subtracting its mean and dividing it by the standard deviation. This can be done in one step with the fit_transform() method:

from sklearn.preprocessing import StandardScaler

ss = StandardScaler() # creating the scaler
scaled_data = ss.fit_transform(customer_data)

The variables are now scaled, and we can examine them by simply printing the content of the scaled_data variable. Alternatively, we can also add them to a new scaled_customer_data DataFrame along with column names and use the head() method again:

scaled_customer_data = pd.DataFrame(columns=selected_cols, data=scaled_data)
scaled_customer_data.head()

This outputs:

  Annual Income (k$)    Spending Score (1-100)
0         -1.738999                 -0.434801
1         -1.738999                  1.195704
2         -1.700830                 -1.715913
3         -1.700830                  1.040418
4         -1.662660                 -0.395980 

This data is ready for clustering! When introducing DBSCAN, we mentioned the minimum number of points and the epsilon. These two values need to be selected prior to creating the model. Let's see how it's done.

Choosing Min. Samples and Epsilon

To choose the minimum number of points for DBSCAN clustering, there is a rule of thumb, which states that it has to be equal or higher than the number of dimensions in the data plus one, as in:

$$
\text{min. points} >= \text{data dimensions} + 1
$$

The dimensions are the number of columns in the dataframe, we are using 2 columns, so the min. points should be either 2+1, which is 3, or higher. For this example, let's use 5 min. points.

$$
\text{5 (min. points)} >= \text{2 (data dimensions)} + 1
$$

Now, to choose the value for ε there is a method in which a Nearest Neighbors algorithm is employed to find the distances of a predefined number of nearest points for each point. This predefined number of neighbors is the min. points we have just chosen minus 1. So, in our case, the algorithm will find the 5-1, or 4 nearest points for each point of our data. those are the k-neighbors and our k equals 4.

$$
\text{k-neighbors} = \text{min. points} - 1
$$

Advice: to learn more about Nearest Neighbors, read our K Nearest Neighbors algorithm in Python and Scikit-learn guide.

After finding the neighbors, we will order their distances from largest to smallest and plot the distances of the y-axis and the points on the x-axis. Looking at the plot, we will find where it resembles the bent of an elbow and the y-axis point that describes that elbow bent is the suggested ε value.

Note: it is possible that the graph for finding the ε value has either one or more "elbow bents", either big or mini, when that happens, you can find the values, test them and choose those with results that best describe the clusters, either by looking at metrics of plots.

To perform these steps, we can import the algorithm, fit it to the data, and then we can extract the distances and indices of each point with kneighbors() method:

from sklearn.neighbors import NearestNeighbors
import numpy as np

nn = NearestNeighbors(n_neighbors=4) # minimum points -1
nbrs = nn.fit(scaled_customer_data)
distances, indices = nbrs.kneighbors(scaled_customer_data)

Note: Just like with DBSCAN, it is essential to choose a distance metric that suits your data when using the Nearest Neighbors algorithm, as it is also distance-based.

For a list of some metrics and explanations on when to use them, you can take a look at Definitive Guide to Hierarchical Clustering with Python and Scikit-Learn.

After finding the distances, we can sort them from largest to smallest. Since the distances array's first column is of the point to itself (meaning all are 0), and the second column contains the smallest distances, followed by the third column which has larger distances than the second, and so on, we can pick only the values of the second column and store them in the distances variable:

distances = np.sort(distances, axis=0)
distances = distances[:,1] # Choosing only the smallest distances

Now that we have our sorted smallest distances, we can import matplotlib, plot the distances, and draw a red line on where the "elbow bend" is:

import matplotlib.pyplot as plt

plt.figure(figsize=(6,3))
plt.plot(distances)
plt.axhline(y=0.24, color='r', linestyle='--', alpha=0.4) # elbow line
plt.title('Kneighbors distance graph')
plt.xlabel('Data points')
plt.ylabel('Epsilon value')
plt.show();

This is the result:

Notice that when drawing the line, we will find out the ε value, in this case, it is 0.24.

We finally have our minimum points and ε. With both variables, we can create and run the DBSCAN model.

Creating a DBSCAN Model

To create the model, we can import it from Scikit-Learn, create it with ε which is the same as the eps argument, and the minimum points to which is the mean_samples argument. We can then store it into a variable, let's call it dbs and fit it to the scaled data:

from sklearn.cluster import DBSCAN

# min_samples == minimum points ≥ dataset_dimensions + 1
dbs = DBSCAN(eps=0.24, min_samples=5)
dbs.fit(scaled_customer_data)

Just like that, our DBSCAN model has been created and trained on the data! To extract the results, we access the labels_ property. We can also create a new labels column in the scaled_customer_data dataframe and fill it with the predicted labels:

labels = dbs.labels_

scaled_customer_data['labels'] = labels
scaled_customer_data.head()

This is the final result:

  Annual Income (k$)    Spending Score (1-100)  labels
0         -1.738999                 -0.434801       -1
1         -1.738999                  1.195704        0
2         -1.700830                 -1.715913       -1
3         -1.700830                  1.040418        0
4         -1.662660                 -0.395980       -1

Observe that we have labels with -1 values; these are the noise points, the ones that don't belong to any cluster. To know how many noise points the algorithm found, we can count how many times the value -1 appears in our labels list:

labels_list = list(scaled_customer_data['labels'])
n_noise = labels_list.count(-1)
print("Number of noise points:", n_noise)

This outputs:

Number of noise points: 62

We already know that 62 points of our original data of 200 points were considered noise. This is a lot of noise, which indicates that perhaps the DBSCAN clustering didn't consider many points as part of a cluster. We will understand what happened soon, when we plot the data.

Initially, when we observed the data, it seemed to have 5 clusters of points. To know how many clusters DBSCAN has formed, we can count the number of labels that are not -1. There are many ways to write that code; here, we have written a for loop, which will also work for data in which DBSCAN has found many clusters:

total_labels = np.unique(labels)

n_labels = 0
for n in total_labels:
    if n != -1:
        n_labels += 1
print("Number of clusters:", n_labels)

This outputs:

Number of clusters: 6

We can see that the algorithm predicted the data to have 6 clusters, with many noise points. Let's visualize that by plotting it with seaborn's scatterplot:

sns.scatterplot(data=scaled_customer_data, 
                x='Annual Income (k$)', y='Spending Score (1-100)', 
                hue='labels', palette='muted').set_title('DBSCAN found clusters');

This results in:

Looking at the plot, we can see that DBSCAN has captured the points which were more densely connected, and points that could be considered part of the same cluster were either noise or considered to form another smaller cluster.

If we highlight the clusters, notice how DBSCAN gets cluster 1 completely, which is the cluster with less space between points. Then it gets the parts of clusters 0 and 3 where the points are closely together, considering more spaced points as noise. It also considers the points in the lower left half as noise and splits the points in the lower right into 3 clusters, once again capturing clusters 4, 2, and 5 where the points are closer together.

We can start to come to a conclusion that DBSCAN was great for capturing the dense areas of the clusters but not so much for identifying the bigger scheme of the data, the 5 clusters' delimitations. It would be interesting to test more clustering algorithms with our data. Let's see if a metric will corroborate this hypothesis.

Evaluating the Algorithm

To evaluate DBSCAN we will use the silhouette score which will take into consideration the distance between points of a same cluster and the distances between clusters.

Note: Currently, most clustering metrics aren't really fitted to be used to evaluate DBSCAN because they aren't based on density. Here, we are using the silhouette score because it is already implemented in Scikit-learn and because it tries to look at cluster shape.

To have a more fitted evaluation, you can use or combine it with the Density-Based Clustering Validation (DBCV) metric, which was designed specifically for density-based clustering. There is an implementation for DBCV available on this GitHub.

First, we can import silhouette_score from Scikit-Learn, then, pass it our columns and labels:

from sklearn.metrics import silhouette_score

s_score = silhouette_score(scaled_customer_data, labels)
print(f"Silhouette coefficient: {s_score:.3f}")

This outputs:

Silhouette coefficient: 0.506

According to this score, it seems DBSCAN could capture approximately 50% of the data.

Conclusion

DBSCAN Advantages and Disadvantages

DBSCAN is a very unique clustering algorithm or model.

If we look at its advantages, it is very good at picking up dense areas in data and points that are far from others. This means that the data doesn't have to have a specific shape and can be surrounded by other points, as long as they are also densely connected.

It requires us to specify minimum points and ε, but there is no need to specify the number of clusters beforehand, as in K-Means, for instance. It can also be used with very large databases since it was designed for high-dimensional data.

As for its disadvantages, we have seen that it couldn't capture different densities in the same cluster, so it has a hard time with large differences in densities. It is also dependent on the distance metric and scaling of the points. This means that if the data isn't well understood, with differences in scale and with a distance metric that doesn't make sense, it will probably fail to understand it.

DBSCAN Extensions

There are other algorithms, such as Hierarchical DBSCAN (HDBSCAN) and Ordering points to identify the clustering structure (OPTICS), which are considered extensions of DBSCAN.

Both HDBSCAN and OPTICS can usually perform better when there are clusters of varying densities in the data and are also less sensitive to the choice or initial min. points and ε parameters.

March 17, 2023 04:22 PM UTC


Mike Driscoll

Python’s Built-in Functions – The all() Function (Video)

This is the next video in my Python Built-ins Series.

Did you know Python has an all() function? Do you know what to use the all() function for?

Find out today by watching this short video!

 

More Videos in the series

The post Python’s Built-in Functions – The all() Function (Video) appeared first on Mouse Vs Python.

March 17, 2023 02:47 PM UTC


Python for Beginners

Pandas DataFrame to List in Python

Python lists and dataframes are two of the most used data structures in python. While we use python lists to handle sequential data, dataframes are used to handle tabular data. In this article, we will discuss different ways to convert pandas dataframe to list in python.

Convert Pandas DataFrame to a List of Rows

Each row in a pandas dataframe is stored as a series object with column names of the dataframe as the index and the values of the rows as associated values. 

To convert a dataframe to a list of rows, we can use the iterrows() method and a for loop. The iterrows() method, when invoked on a dataframe, returns an iterator. The iterator contains all the rows as a tuple having the row index as the first element and a series containing the row data as its second element. We can iterate through the iterator to access all the rows.

To create a list of rows from the dataframe using the iterator, we will use the following steps.

After execution of the for loop, we will get the output list of rows. You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90}]
df=pd.DataFrame(myDicts)
print("The original dataframe is:")
print(df)
rowList=list()
for index,row in df.iterrows():
    rowList.append(row)
print("The list of rows is:")
print(rowList)

Output:

The original dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
The list of rows is:
[Roll           1
Maths        100
Physics       80
Chemistry     90
Name: 0, dtype: int64, Roll           2
Maths         80
Physics      100
Chemistry     90
Name: 1, dtype: int64, Roll          3
Maths        90
Physics      80
Chemistry    70
Name: 2, dtype: int64, Roll           4
Maths        100
Physics      100
Chemistry     90
Name: 3, dtype: int64]

In this example, you can observe that we have created a list of rows from the dataframe. You can observe that the elements of the list are series objects and not arrays representing the rows.

Pandas DataFrame to List of Arrays in Python

Instead of creating a list of rows, we can create a list of arrays containing the values in rows from the dataframe. For this we will take out the values of the dataframe using the values attribute. The values attribute of the dataframe contains a 2-D array containing the row values of the dataframe. 

Once we get the values from the dataframe, we will convert the array to a list of arrays using the list() function. The list() function takes the values of the array as its input and returns the list of arrays as shown below.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90}]
df=pd.DataFrame(myDicts)
print("The original dataframe is:")
print(df)
rowList=list(df.values)
print("The list of rows is:")
print(rowList)

Output:

The original dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
The list of rows is:
[array([  1, 100,  80,  90]), array([  2,  80, 100,  90]), array([ 3, 90, 80, 70]), array([  4, 100, 100,  90])]

In this example, you can observe that we have created a list of arrays from the dataframe.

Convert Pandas DataFrame to a List of Lists

Instead of creating a list of arrays, we can also convert pandas dataframe into a list of lists. For this, we can use two approaches.

Pandas DataFrame to a List of Lists Using iterrows() Method

To convert a dataframe into a list of lists, we will use the following approach.

After execution of the for loop, the pandas dataframe is converted to a list of lists. You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90}]
df=pd.DataFrame(myDicts)
print("The original dataframe is:")
print(df)
rowList=list()
for index,row in df.iterrows():
    rowList.append(row.tolist())
print("The list of rows is:")
print(rowList)

Output:

The original dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
The list of rows is:
[[1, 100, 80, 90], [2, 80, 100, 90], [3, 90, 80, 70], [4, 100, 100, 90]]

Using tolist() Method And The Values Attribute

Instead of using the iterrows() method and the for loop, we can directly convert the pandas dataframe to a list of lists using the values attribute. For this, we will first obtain the values in the data frame using the values attribute. Next, we will invoke the tolist() method on the values. This will give us the list of lists created from the dataframe. You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90}]
df=pd.DataFrame(myDicts)
print("The original dataframe is:")
print(df)
rowList=df.values.tolist()
print("The list of rows is:")
print(rowList)

Output:

The original dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
The list of rows is:
[[1, 100, 80, 90], [2, 80, 100, 90], [3, 90, 80, 70], [4, 100, 100, 90]]

Get a List of Column Names From Dataframe

To get a list of column names from a dataframe, you can use the columns attribute. The columns attribute of a dataframe contains a list having all the column names. You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90}]
df=pd.DataFrame(myDicts)
print("The original dataframe is:")
print(df)
nameList=df.columns
print("The list of column names is:")
print(nameList)

Output:

The original dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
The list of column names is:
Index(['Roll', 'Maths', 'Physics', 'Chemistry'], dtype='object')

Alternatively, you can pass the entire dataframe to the list() function. When we pass a dataframe to the list() function, it returns a list containing the columns of the dataframe. You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90}]
df=pd.DataFrame(myDicts)
print("The original dataframe is:")
print(df)
nameList=list(df)
print("The list of column names is:")
print(nameList)

Output:

The original dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
The list of column names is:
['Roll', 'Maths', 'Physics', 'Chemistry']

Convert Dataframe Column to a List in Python

To convert a dataframe column to a list, you can use the tolist() method as shown in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90}]
df=pd.DataFrame(myDicts)
print("The original dataframe is:")
print(df)
rollList=df["Roll"].tolist()
print("The list of Roll column is:")
print(rollList)

Output:

The original dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
The list of Roll column is:
[1, 2, 3, 4]

In this example, you can observe that we have used the tolist() method to convert a dataframe column to a list.

Conclusion

In this article, we discussed different ways to convert pandas dataframe to list in python. We also discussed how to convert the dataframe to a list of rows as well as a list of lists. To know more about python programming, you can read this article on Dataframe Constructor Not Properly Called Error in Pandas. You might also like this article on how to split string into characters in Python.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy Learning!

The post Pandas DataFrame to List in Python appeared first on PythonForBeginners.com.

March 17, 2023 01:00 PM UTC


Real Python

The Real Python Podcast – Episode #149: Coding With namedtuple & Python's Dynamic Superpowers

Have you explored Python's collections module? Within it, you'll find a powerful factory function called namedtuple(), which provides multiple enhancements over the standard tuple for writing clearer and cleaner code. This week on the show, Christopher Trudeau is here, bringing another batch of PyCoder's Weekly articles and projects.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

March 17, 2023 12:00 PM UTC


eGenix.com

PyDDF Python Spring Sprint 2023

The following text is in German, since we're announcing a Python sprint in Düsseldorf, Germany.

Ankündigung

Python Meeting Herbst Sprint 2023 in
Düsseldorf

Samstag, 25.03.2023, 10:00-18:00 Uhr
Sonntag, 26.03.2023. 10:00-18:00 Uhr

Atos Information Technology GmbH, Am Seestern 1, 40547 Düsseldorf

Informationen

Das Python Meeting Düsseldorf (PyDDF) veranstaltet mit freundlicher Unterstützung der Atos Information Technology GmbH ein Python Sprint Wochenende.

Der Sprint findet am Wochenende 25./26.03.2023 in der Atos Niederlassung, Am Seestern 1, in Düsseldorf statt.Folgende Themengebiete sind als Anregung bereits angedacht:
Natürlich können die Teilnehmenden weitere Themen vorschlagen und umsetzen.

Anmeldung, Kosten und weitere Infos

Alles weitere und die Anmeldung findet Ihr auf der Meetup Sprint Seite:

WICHTIG: Ohne Anmeldung können wir den Gebäudezugang nicht vorbereiten. Eine spontane Anmeldung am Sprint Tag wird daher vermutlich nicht funktionieren. Also bitte unbedingt mit vollen Namen bis spätestens am Freitag, 24.03., über Meetup anmelden.

Teilnehmer sollten sich zudem in der PyDDF Telegram Gruppe registrieren, da wir uns dort koordinieren:

Über das Python Meeting Düsseldorf

Das Python Meeting Düsseldorf ist eine regelmäßige Veranstaltung in Düsseldorf, die sich an Python-Begeisterte aus der Region wendet.

Einen guten Überblick über die Vorträge bietet unser PyDDF YouTube-Kanal, auf dem wir Videos der Vorträge nach den Meetings veröffentlichen.

Veranstaltet wird das Meeting von der eGenix.com GmbH, Langenfeld, in Zusammenarbeit mit Clark Consulting & Research, Düsseldorf.

Viel Spaß !

Marc-André Lemburg, eGenix.com

March 17, 2023 09:00 AM UTC


Python Engineering at Microsoft

Introducing the Data Wrangler extension for Visual Studio Code Insiders

We’re excited to announce the launch of Data Wrangler, a revolutionary tool for data scientists and analysts who work with tabular data in Python. Data Wrangler is an extension for VS Code Insiders and the first step towards our vision of simplifying and expediting the data preparation process on Microsoft platforms.

Data preparation, cleaning, and visualization is a time-consuming task for many data scientists, but with Data Wrangler we’ve developed a solution that simplifies this process. Our goal is to make this process more accessible and efficient for everyone, to free up your time to focus on other parts of the data science workflow. To try Data Wrangler today, go to the Extension Marketplace tab in VS Code Insiders and search for “Data Wrangler”. To learn more about Data Wrangler, check out the documentation here: https://aka.ms/datawrangler.

With Data Wrangler, you can seamlessly clean and explore your data in VS Code Insiders. It offers a variety of features that will help you quickly identify and fix errors, inconsistencies, and missing data. You can perform data profiling and data quality checks, visualize data distributions, and easily transform data into the format you need. Plus, Data Wrangler comes with a library of built-in transformations and visualizations, so you can focus on your data, not the code. As you make changes, the tool generates code using open-source Python libraries for the data transformation operations you perform. This means you can write better data preparation programs faster and with fewer errors. The code also keeps Data Wrangler transparent and helps you verify the correctness of the operation as you go.

Data Wrangler operation

In a recent study, Python data scientists using the Pandas dataframe library report spending the majority (~51%) of their time preparing, cleaning and visualizing data for their models (Anaconda State of Data Science Report 2022). This activity is critical to the success of their projects, as poor data quality directly impacts the quality of the predictions made by their models. Furthermore, this activity is not predictable: the industry even calls it exploratory data analysis to capture the fact that it is often highly creative, requiring experimentation, visualization, comparison and iteration. However, despite the activity being creative and iterative, the individual operations are not – they involve writing small code snippets that drop columns, remove missing values, etc. But today there isn’t tooling support that makes it easier; In our research with data scientists, we regularly see them searching for and copy-pasting snippets of code from Stack Overflow into their programs.

Data Wrangler Interface

With Data Wrangler, we’ve developed an interactive UI that writes the code for you. As you inspect and visualize your Pandas dataframes using Data Wrangler, generating the code for your desired operations is easy. For instance, if you want to remove a column, you can right-click on the column heading and delete it, and Data Wrangler will generate the Python code to do that. If you want to remove rows containing missing values or substitute them with a computed default value, you can do that directly from the UI. If you want to reformat a categorical column by one-hot encoding it to make it suitable for machine learning algorithms, you can do so with a single command.

Create column from examples

Data scientists often need to create a new derived column from existing columns in their Pandas dataframe, which usually involves writing custom code that can easily become a source of bugs. With Data Wrangler, all you need to do is provide examples of how you want the data in the derived column to look like, and PROSE, our AI-powered program synthesis technology (the same technology that powers Microsoft Excel’s Flash Fill feature), will write the Python code for you. If you find an error in the results, you can correct it with a new example, and PROSE will rewrite the Python code to produce a better result. You can even modify the generated code yourself.

Extract first name by example

 

How to try Data Wrangler

To start using Data Wrangler today in Visual Studio Code Insiders, just download the Data Wrangler extension from the marketplace and visit our getting started page to try it out! You can then launch Data Wrangler from any Pandas dataframe output in a Jupyter Notebook, or by right-clicking any CSV or Parquet file in VS Code and selecting “Open in Data Wrangler”.

Data Wrangler entrypoint

This is the first release of Data Wrangler so we are looking for feedback as we iterate on the product. Please provide any product feedback here. If you run into any issues, please file a bug report in our Github repo here. Our plan is to move the extension from VS Code Insiders to VS Code in the near future.

The post Introducing the Data Wrangler extension for Visual Studio Code Insiders appeared first on Python.

March 17, 2023 01:29 AM UTC

March 16, 2023


PyCharm

PyCharm 2023.1 Release Candidate Is Out!

PyCharm 2023.1 is just around the corner! Check out the fixes and improvements we added to the PyCharm 2023.1 Release Candidate.

To see what has already been added in PyCharm 2023.1 during the early access program, take a look at our EAP blog posts.

The Toolbox App is the easiest way to get the EAP builds and keep both your stable and EAP versions up to date. You can also manually download the EAP builds from our website.

Download PyCharm 2023.1 RC

Faster variable value previews for large collections

We optimized the performance of the Special Variable window available in the Python Console and Debug Console. A preview of the calculated variable values is now displayed faster, even for large collections such as arrays, deques, dictionaries, lists, sets, frozensets, and tuples.

To see the full list of variable values, click on View as …, Load next to the variable preview.

Instant access to the Python Console and Python Packages tool window

As part of our work to refine the new UI, we’ve put the Python Console icon on the main editor screen so that you can instantly navigate to the console when needed. Next to the Python Console icon, you can now find the icon for the Python Packages tool window, so you can quickly manage project packages.

The Release Candidate also delivers the following fixes:

These are the most important updates for PyCharm 2023.1 Release Candidate. For the full list of improvements, check out the release notes. Share your feedback on the new features in the comments below, on Twitter, or in our issue tracker.

March 16, 2023 07:03 PM UTC


Stack Abuse

Python TypeError: < not supported between instances of str and int

Introduction

In this article, we'll be taking a look at a common Python 3 error: TypeError: '<' not supported between instances of 'str' and 'int'. This error occurs when an attempt is made to compare a string and an integer using the less than (<) operator. We will discuss the reasons behind this error and provide solutions for fixing it. Additionally, we will also cover how to resolve this error when dealing with lists, floats, and tuples.

Why You Get This Error

In most languages, you probably know that the less than (<) operator is used for comparison between two values. However, it is important to note that the two values being compared must be of the same type or be implicitly convertible to a common type. For example, two implicitly convertable types would be an integer and a float since they're both numbers. But in this specific case, we're trying to compare a string and an integer, which are not implicitly convertable.

When you try to compare a string and an integer using one of the comparison operators, Python raises a TypeError because it cannot convert one of the values to a common type.

For instance, consider the following code:

num = 42
text = "hello"
if num < text:
    print("The number is smaller than the text.")   # confused.jpg

In this example, the comparison between num (an integer) and text (a string) using the less than operator will raise the error:

TypeError: '<' not supported between instances of 'str' and 'int'

How to Fix this Error

To fix this error, you need to ensure that both values being compared are of the same type or can be implicitly converted to a common type. In most cases, this means converting the integer to a string or vice versa.

Using a similar example as above, here's how you can resolve the error:

  1. Convert the integer to a string:
num = 42
text = "46"
if str(num) < text:
    print("The number is smaller than the text.")
  1. Convert the string to an integer:
num = 42
text = "46"

# Assuming 'text' represents a numeric value
numeric_text = int(text)
if num < numeric_text:
    print("The number is smaller than the text.")

This example works because the string does represent an integer. If, however, we were to try this fix on the example at the beginning of this article, it wouldn't work and Python would raise another error since the given text is not convertable to an integer. So while this fix works in some use-cases, it is not universal.

Fixing this Error with Lists

When working with lists, the error may occur when trying to compare an integer (or other primitive types) to a list.

my_list = ["1", "2", "3"]

if my_list > 3:
    print("Greater than 3!")
TypeError: '>' not supported between instances of 'list' and 'int'

In this case, the fix really depends on your use-case. When you run into this error, the common problem is that you meant to compare the variable to a single element in the list, not the entire list itself. Therefore, in this case you will want to access a single element in the list to make the comparison. Again, you'll need to make sure the elements are of the same type.

my_list = ["1", "2", "3"]

if int(my_list[1]) > 3:
    print("Greater than 3!")
Greater than 3!

Fixing this Error with Floats

When comparing floats and strings, the same error will arise as it's very similar to our first example in this article. To fix this error, you need to convert the float or the string to a common type, just like with integers:

num = 3.14
text = "3.15"

if float(text) < num:
    print("The text as a float is smaller than the number.")

Fixing this Error with Tuples

When working with tuples, just like lists, if you try to compare the entire tuple to a primitive value, you'll run into the TypeError. And just like before, you'll likely want to do the comparison on just a single value of the tuple.

Another possibility is that you'll want to compare all of the elements of the tuple to a single value, which we've shown below using the built-in all() function:

str_tuple = ("1.2", "3.2", "4.4")
my_float = 6.8

if all(tuple(float(el) < my_float for el in str_tuple)):
    print("All are lower!")

In this example, we iterate through all elements in the touple, convert them to floats, and then make the comparison. The resulting tuple is then checked for all True values using all(). This way we're able to make an element-by-element comparison.

Conclusion

In this article, we discussed the TypeError: '<' not supported between instances of 'str' and 'int' error in Python, which can occur when you try to compare a string and an integer using comparison operators. We provided solutions for fixing this error by converting the values to a common type and also covered how to resolve this error when working with lists, floats, and tuples.

By understanding the root cause of this error and applying the appropriate type conversion techniques, you can prevent this error from occurring and ensure your comparisons work as intended.

March 16, 2023 12:23 PM UTC


eGenix.com

Python Meeting Düsseldorf - 2023-03-22

The following text is in German, since we're announcing a regional user group meeting in Düsseldorf, Germany.

Ankündigung

Das nächste Python Meeting Düsseldorf findet an folgendem Termin statt:

22.03.2023, 18:00 Uhr
Raum 1, 2.OG im Bürgerhaus Stadtteilzentrum Bilk
Düsseldorfer Arcaden, Bachstr. 145, 40217 Düsseldorf


Programm

Bereits angemeldete Vorträge

Charlie Clark:
        "Eine neue XML Bibliothek: Pugixml"

Marc-Andre Lemburg:
        "Data Analysis with OpenSearch – Waiting times at the DUS airport"

Arkadius Schuchhardt:
        "Concurrent data loading by using the thread producer-consumer pattern"

Many Kasiriha:
        "Einführung in das Robot Framework"

Weitere Vorträge können gerne noch angemeldet werden. Bei Interesse, bitte unter info@pyddf.de melden.

Startzeit und Ort

Wir treffen uns um 18:00 Uhr im Bürgerhaus in den Düsseldorfer Arcaden.

Das Bürgerhaus teilt sich den Eingang mit dem Schwimmbad und befindet sich an der Seite der Tiefgarageneinfahrt der Düsseldorfer Arcaden.

Über dem Eingang steht ein großes "Schwimm’ in Bilk" Logo. Hinter der Tür direkt links zu den zwei Aufzügen, dann in den 2. Stock hochfahren. Der Eingang zum Raum 1 liegt direkt links, wenn man aus dem Aufzug kommt.

>>> Eingang in Google Street View

Corona

Die Corona Einschränkungen sind mittlerweile aufgehoben worden. Vorsicht ist zwar immer noch geboten, aber jetzt jedem selbst überlassen.

⚠️ Wichtig: Bitte nur dann anmelden, wenn ihr absolut sicher seid, dass ihr auch kommt. Angesichts der begrenzten Anzahl Plätze, haben wir kein Verständnis für kurzfristige Absagen oder No-Shows.

Einleitung

Das Python Meeting Düsseldorf ist eine regelmäßige Veranstaltung in Düsseldorf, die sich an Python Begeisterte aus der Region wendet.

Einen guten Überblick über die Vorträge bietet unser PyDDF YouTube-Kanal, auf dem wir Videos der Vorträge nach den Meetings veröffentlichen.

Veranstaltet wird das Meeting von der eGenix.com GmbH, Langenfeld, in Zusammenarbeit mit Clark Consulting & Research, Düsseldorf:

Format

Das Python Meeting Düsseldorf nutzt eine Mischung aus (Lightning) Talks und offener Diskussion.

Vorträge können vorher angemeldet werden, oder auch spontan während des Treffens eingebracht werden. Ein Beamer mit XGA Auflösung steht zur Verfügung.

(Lightning) Talk Anmeldung bitte formlos per EMail an info@pyddf.de

Kostenbeteiligung

Das Python Meeting Düsseldorf wird von Python Nutzern für Python Nutzer veranstaltet.

Da Tagungsraum, Beamer, Internet und Getränke Kosten produzieren, bitten wir die Teilnehmer um einen Beitrag in Höhe von EUR 10,00 inkl. 19% Mwst. Schüler und Studenten zahlen EUR 5,00 inkl. 19% Mwst.

Wir möchten alle Teilnehmer bitten, den Betrag in bar mitzubringen.

Anmeldung

Da wir nur 25 Personen in dem angemieteten Raum empfangen können, möchten wir bitten, sich vorher anzumelden.

Meeting Anmeldung bitte per Meetup

Weitere Informationen

Weitere Informationen finden Sie auf der Webseite des Meetings:

              https://pyddf.de/

Viel Spaß !

Marc-Andre Lemburg, eGenix.com

March 16, 2023 09:00 AM UTC


Matt Layman

Cater Waiter, Template Bugs, and Type Fixes - Building SaaS with Python and Django #155

In this episode, I did another Exercism problem in Python that dug into Python sets. Once the exercise was complete, we went back to the issue list. I debugged and fixed a template error, the spent time improving types with my Django app.

March 16, 2023 12:00 AM UTC