skip to navigation
skip to content

Planet Python

Last update: September 24, 2021 01:41 AM UTC

September 23, 2021


Python for Beginners

How to create an iterator in Python

Iterators are used to access the elements of an iterable object in a sequential manner. We can create an iterator for any container object such as a python dictionary, list, tuple, or a set. In this article, we will discuss how to create an iterator in python. We will also learn how to create custom iterators by overriding inbuilt methods.

Create iterator using inbuilt methods

We can create an iterator using the iter() function and the __next__() method. The iter() function takes a container object such as list, tuple, or set and returns an iterator with which we can access the elements of the container object.

To create an iterator for any container object using the iter() function, we just have to pass the object to the iter() function. The function creates an iterator and returns a reference to it. We can create an iterator using the iter() function as follows.

myList = [1, 2, 3, 4, 5, 6, 7]
myIter = iter(myList)
print("The list is:", myList)
print("The iterator is:", myIter)

Output:

The list is: [1, 2, 3, 4, 5, 6, 7]
The iterator is: <list_iterator object at 0x7f73fed18070>

In the output, you can observe that a list_iterator object has been created upon the execution of the iter() function.

How to access elements from an iterator?

To access the elements of the container object using iterators in a sequence, we can use a for loop as follows.

myList = [1, 2, 3, 4, 5, 6, 7]
myIter = iter(myList)
print("The list is:", myList)
print("The elements in the iterator are:")
for i in myIter:
    print(i)

Output:

The list is: [1, 2, 3, 4, 5, 6, 7]
The elements in the iterator are:
1
2
3
4
5
6
7

If you need to access the elements one by one, you can use the next() function or the __next() method. 

To traverse the iterator using next() function, we pass the iterator as an input argument to the function. It returns the next element in the iterator. The next() function also remembers the index at which the iterator had been traversed last time. When it is called again, it returns the next element which has not been traversed yet. This can be observed in the following example.

myList = [1, 2, 3, 4, 5, 6, 7]
myIter = iter(myList)
print("The list is:", myList)
print("The elements in the iterator are:")
element = next(myIter)
print(element)
element = next(myIter)
print(element)
element = next(myIter)
print(element)
element = next(myIter)
print(element)
element = next(myIter)
print(element)
element = next(myIter)
print(element)
element = next(myIter)
print(element)

Output:

The list is: [1, 2, 3, 4, 5, 6, 7]
The elements in the iterator are:
1
2
3
4
5
6
7

Process finished with exit code 0

The __next__() method works in a similar way as the next() function. Whenever the __next__() method is invoked on the iterator, it returns the next element that has not been traversed yet. 

myList = [1, 2, 3, 4, 5, 6, 7]
myIter = iter(myList)
print("The list is:", myList)
print("The elements in the iterator are:")
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)

Output:

The list is: [1, 2, 3, 4, 5, 6, 7]
The elements in the iterator are:
1
2
3
4
5
6
7

Process finished with exit code 0

When there is no element left for traversing and we use the next() function or the __next__() method, it raises the StopIteration exception.

myList = [1, 2, 3, 4, 5, 6, 7]
myIter = iter(myList)
print("The list is:", myList)
print("The elements in the iterator are:")
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)

Output:

Traceback (most recent call last):
  File "/home/aditya1117/PycharmProjects/pythonProject/webscraping.py", line 19, in <module>
    element = myIter.__next__()
StopIteration

It is advised to use these functions inside a python try except block to avoid the exception.

How to create a custom iterator in  Python

To create a custom iterator, we can override the __iter__() and __next__() methods that are set to default. We will understand how to create a custom iterator using the following example.

Suppose that you want to create an iterator that returns the square of each element of  a list or tuple when we iterate through the object using  the iterator. For this, we will override the __iter__() method and __next__() method. 

The constructor of the iterator will accept the list or tuple and the total number of elements in it. Then it will initialize the iterator class. As we need to keep track of the last element which was traversed, we will initialize an index field and set it to 0.

class SquareIterator:
    def __init__(self, data, noOfElements):
        self.data = data
        self.noOfElements = noOfElements
        self.count = 0

   

The __iter__() method is used to initialise an iterator. The __iter__() method is implemented as follows.

class SquareIterator:
    def __init__(self, data, noOfElements):
        self.data = data
        self.noOfElements = noOfElements
        self.count = 0

    def __iter__(self):
        return self

   

After overriding the __iter__() method, we will override the __next__() method. The __next__() method is used to traverse the next element that has not been traversed yet. Here, we need to return the square of the next element. 

As we have initialized the index to -1 in the __iter__() method, we will first increment the index. After that, we will return the square of the element at the specified index. Each time the __next__() method is invoked, the index will be incremented and the square of the element at the specified index will be given as output. When the number of traversed elements will equal the total number of elements in the list or tuple, we will raise the StopIteration exception. It will stop the iteration. The __next__() method can be defined as follows.

class SquareIterator:
    def __init__(self, data, noOfElements):
        self.data = data
        self.noOfElements = noOfElements
        self.count = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.count < self.noOfElements:
            square = self.data[self.count] ** 2
            self.count = self.count + 1
            return square
        else:
            raise StopIteration

The entire program to create the iterator is as follows.

class SquareIterator:
    def __init__(self, data, noOfElements):
        self.data = data
        self.noOfElements = noOfElements
        self.count = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.count < self.noOfElements:
            square = self.data[self.count] ** 2
            self.count = self.count + 1
            return square
        else:
            raise StopIteration


myList = [1, 2, 3, 4, 5, 6, 7]
myIter = SquareIterator(myList, 7)
print("The list is:", myList)
print("The elements in the iterator are:")
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)
element = myIter.__next__()
print(element)

Output:

The list is: [1, 2, 3, 4, 5, 6, 7]
The elements in the iterator are:
1
4
9
16
25
36
49

Process finished with exit code 0

Conclusion

In this article, we have studied two ways to create an iterator in Python. You can use custom iterators to perform different operations on the elements of a container object while traversing it. It is advised to implement some programs to create an iterator so that you can acquire a better understanding of the topic.

The post How to create an iterator in Python appeared first on PythonForBeginners.com.

September 23, 2021 01:13 PM UTC


Mike Driscoll

Python 101 – How to Create a Python Package

When you create a Python file, you are creating a Python module. Any Python file that you create can be imported by another Python script. Thus, by definition, it is also a Python module. If you have two or more related Python files, then you may have a Python package.

Some organizations keep all their code to themselves. This is known as closed-source. Python is an open-source language and most of the Python modules and packages that you can get from the Python Package Index (PyPI) are all free and open-source as well. One of the quickest ways to share your package or module is to upload it to the Python Package Index or Github or both.

In this article, you will learn about the following topics:

The first step in the process is to understand what creating a reusable module looks like. Let’s get started!

Creating a Module

Any Python file you create is a module that you can import. You can try it out with some of the examples from this book by adding a new file to any of the articles’ code folders and attempt to import one of the modules in there. For example, if you have a Python file named a.py and then create a new file named b.py, you can import a into b through the use of import a.

Of course, that’s a silly example. Instead, you will create a simple module that has some basic arithmetic functions in it. You can name the file arithmetic.py and add this code to it:

# arithmetic.py

def add(x, y):
    return x + y

def divide(x, y):
    return x / y

def multiply(x, y):
    return x * y

def subtract(x, y):
    return x - y

This code is very naive. You have no error handling at all, for example. What that means is that you could divide by zero and cause an exception to be thrown. You could also pass incompatible types to these functions, like a string and an integer — that would cause a different kind of exception to be raised.

However, for learning purposes, this code is adequate. You can prove that it is importable by creating a test file. Create a new file named test_arithmetic.py and add this code to it:

# test_arithmetic.py

import arithmetic
import unittest

class TestArithmetic(unittest.TestCase):

    def test_addition(self):
        self.assertEqual(arithmetic.add(1, 2), 3)

    def test_subtraction(self):
        self.assertEqual(arithmetic.subtract(2, 1), 1)

    def test_multiplication(self):
        self.assertEqual(arithmetic.multiply(5, 5), 25)

    def test_division(self):
        self.assertEqual(arithmetic.divide(8, 2), 4)

if __name__ == '__main__':
    unittest.main()

Save your test in the same folder as your module. Now you can run this code using the following command:

$ python3 test_arithmetic.py 
....
----------------------------------------------------------------------
Ran 4 tests in 0.000s

OK

This demonstrates that you can import arithmetic.py as a module. These tests also show the basic functionality of the code works. You can enhance these tests by testing division by zero and mixing strings and integers. Those kinds of tests will currently fail. Once you have a failing test, you can follow the Test Driven Development methodology to fix the issues.

Now let’s find out how to make a Python package!

Creating a Package

A Python package is one or more files that you plan on sharing with others, usually by uploading it to the Python Package Index (PyPI). Packages are generally made by naming a directory of files rather than a file itself. Then inside of that directory you will have a special __init__.py file. When Python sees the __init__.py file, it knows that the folder is importable as a package.

There are a couple ways to transform arithmetic.py into a package. The simplest is to move the code from arithmetic.py into arithmetic/__init__.py:

That last step is extremely important! If your tests still pass then you know your conversion from a module to a package worked. To test out your package, open up a Command Prompt if you’re on Windows or a terminal if you’re on Mac or Linux. Then navigate to the folder that contains the arithmetic folder, but not inside of it. You should now be in the same folder as your test_arithmetic.py file. At this point you can run python test_arithmetic.py and see if your efforts were successful.

It might seem silly to simply put all your code in a single __init__.py file, but that actually works fine for files up to a few thousand lines.

The second way to transform arithmetic.py into a package is similar to the first, but involves using more files than just __init__.py. In real code the functions/classes/etc. in each file would be grouped somehow — perhaps one file for all your package’s custom exceptions, one file for common utilities, and one file for the main functionality.

For our example, you’ll just split the four functions in arithmetic.py into their own files. Go ahead and move each function from __init__.py into its own file. Your folder structure should look like this:

arithmetic/
    __init__.py
    add.py
    subtract.py
    multiply.py
    divide.py

For the __init__.py file, you can add the following code:

# __init__.py
from .add import add
from .subtract import subtract
from .multiply import multiply
from .divide import divide

Now that you’ve made these changes what should be your next step? Hopefully, you said, “Run my tests!” If your tests still pass then you haven’t broken your API.

Right now your arithmetic package is only available to your other Python code if you happen to be in the same folder as your test_arithmetic.py file. To make it available in your Python session or in other Python code you can use Python’s sys module to add your package to the Python search path. The search path is what Python uses to find modules when you use the import keyword. You can see what paths Python searches by printing out sys.path.

Let’s pretend that your arithmetic folder is in this location: /Users/michael/packages/arithmetic. To add that to Python’s search path, you can do this:

import sys

sys.path.append("/Users/michael/packages/arithmetic")
import arithmetic

print(arithmetic.add(1, 2))

This will add arithmetic to Python’s path so you can import it and then use the package in your code. However, that’s really awkward. It would be nice if you could install your package using pip so you don’t have to mess around with the path all the time.

Let’s find out how to do that next!

Packaging a Project for PyPI

When it comes to creating a package for the Python Package Index (PyPI), you will need some additional files. There is a good tutorial on the process for creating and uploading a package to PyPI here:

The official packaging instructions recommend that you set up a directory structure like this:

my_package/
    LICENSE
    README.md
    arithmetic/
        __init__.py
        add.py
        subtract.py
        multiply.py
        divide.py
    setup.py
    tests/

The tests folder can be empty. This is the folder where you would include tests for your package. Most developers use Python’s unittest or the pytest framework for their tests. For this example, you can leave the folder empty.

Let’s move on and learn about the other files you need to create in the next section!

Creating Project Files

The LICENSE file is where you mention what license your package has. This tells the users of the package what they can and cannot do with your package. There are a lot of different licenses you can use. The GPL and MIT licenses are just a couple of popular examples.

The README.md file is a description of your project, written in Markdown. You will want to write about your project in this file and include any information about dependencies that it might need. You can give instructions for installation as well as example usage of your package. Markdown is quite versatile and even let’s you do syntax highlighting!

The other file you need to supply is setup.py. That file is more complex, so you’ll learn about that in the next section.

Creating setup.py

There is a special file named setup.py that is used as a build script for Python distributions. It is used by setuptools, which does the actual building for you. If you’d like to know more about setuptools, then you should check out the following:

You can use the setup.py to create a Python wheel. The wheel is a ZIP-format archive with a specially formatted name and a .whl extension. It contains everything necessary to install your package. You can think of it as a zipped version of your code that pip can unzip and install for you. The wheel follows PEP 376, which you can read about here:

Once you’re done reading all that documentation (if you wanted to), you can create your setup.py and add this code to it:

import setuptools

with open("README.md", "r") as fh:
    long_description = fh.read()

setuptools.setup(
    name="arithmetic-YOUR-USERNAME-HERE", # Replace with your own username
    version="0.0.1",
    author="Mike Driscoll",
    author_email="driscoll@example.com",
    description="A simple arithmetic package",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/driscollis/arithmetic",
    packages=setuptools.find_packages(),
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
    python_requires='>=3.6',
)

The first step in this code is to import setuptools. Then you read in your README.md file into a variable that you will use soon. The last bit is the bulk of the code. Here you call setuptools.setup(), which can take in quite a few different arguments. The example above is only a sampling of what you can pass to this function. To see the full listing, you’ll need to go here:

Most of the arguments are self-explanatory. Let’s focus on the more obtuse ones. The packages arguments is a list of packages needed for your package. In this case, you use find_packages() to find the necessary packages for you automatically. The classifiers argument is used for passing additional metadata to pip. For example, this code tells pip that the package is Python 3 compatible.

Now that you have a setup.py, you are ready to create a Python wheel!

Generating a Python Wheel

The setup.py is used to create Python wheels. It’s always a good idea to make sure you have the latest version of setuptools and wheel installed, so before you create your own wheel, you should run the following command:

python3 -m pip install --user --upgrade setuptools wheel

This will update the packages if there is a newer version than the one you currently have installed. Now you are ready to create a wheel yourself. Open up a Command Prompt or terminal application and navigate to the folder that contains your setup.py file. Then run the following command:

python3 setup.py sdist bdist_wheel

This command will output a lot of text, but once it has finished you will find a new folder named dist that contains the following two files:

The tar.gz is a source archive, which means it has the Python source code for your package inside of it. Your users can use the source archive to build the package on their own machines, if they need to. The whl format is an archive that is used by pip to install your package on your user’s machine.

You can install the wheel using pip directly, if you want to:

python3 -m pip install arithmetic_YOUR_USERNAME_HERE-0.0.1-py3-none-any.whl

But the normal method would be to upload your package to the Python Package Index (PyPI) and then install it. Let’s discover how to get your amazing package on PyPI next!

Uploading to PyPI

The first step to upload a package to PyPI is to create an account on Test PyPI. This allows you to test that your package can be uploaded on a test server and installed from that test server. To create an account, go to the following URL and follow the steps on that page:

Now you need to create a PyPI API token. This will allow you to upload the package securely. To get the API token, you’ll need to go here:

You can limit a token’s scope. However, you don’t need to do that for this token as you are creating it for a new project. Make sure you copy the token and save it off somewhere BEFORE you close the page. Once the page is closed, you cannot retrieve the token again. You will be required to create a new token instead.

Now that you are registered and have an API token, you will need to get the twine package. You will use twine to upload your package to PyPI. To install twine, you can use pip like this:

python3 -m pip install --user --upgrade twine

Once installed, you can upload your package to Test PyPI using the following command:

python3 -m twine upload --repository testpypi dist/*

Note that you will need to run this command from within the folder that contains the setup.py file as it is copying all the files in the dist folder to Test PyPI. When you run this command, it will prompt you for a username and password. For the username, you need to use __token__. The password is the token value that is prefixed with pypi-.

When this command runs, you should see output similar to the following:

Uploading distributions to https://test.pypi.org/legacy/
Enter your username: [your username]
Enter your password:
Uploading arithmetic_YOUR_USERNAME_HERE-0.0.1-py3-none-any.whl
100%|?????????????????????| 4.65k/4.65k [00:01<00:00, 2.88kB/s]
Uploading arithmetic_YOUR_USERNAME_HERE-0.0.1.tar.gz
100%|?????????????????????| 4.25k/4.25k [00:01<00:00, 3.05kB/s]

At this point, you should now be able to view your package on Test PyPI at the following URL:

Now you can test installing your package from Test PyPI by using the following command:

python3 -m pip install --index-url https://test.pypi.org/simple/ --no-deps arithmetic-YOUR-USERNAME-HERE

If everything worked correctly, you should now have the arithmetic package installed on your system. Of course, this tutorial showed you how to package things up for Test PyPI. Once you have verified that it works, then you will need to do the following to install to the real PyPI:

Now you know how to distribute a package of your own creation on the Python Package Index!

Wrapping Up

Python modules and packages are what you import in your programs. They are, in many ways, the building blocks of your programs. In this article, you learned about the following:

At this point, you not only know what modules and packages are, but also how to distribute them via the Python Package Index. Now you and any other Python developer can download and install your packages. Congratulations! You are now a package maintainer!

Related Reading

This article is based on a chapter from Python 101, 2nd Edition, which you can purchase on Leanpub or Amazon.

If you’d like to learn more Python, then check out these tutorials:

The post Python 101 – How to Create a Python Package appeared first on Mouse Vs Python.

September 23, 2021 12:30 PM UTC


EuroPython Society

List of EPS Board Candidates for 2021/2022

At this year’s EuroPython Society General Assembly (GA), planned for October, we will vote in a new board of the EuroPython Society for the term 2021/2022.

List of Board Candidates

The EPS bylaws require one chair, one vice chair and 2 - 7 board members. The following candidates have stated their willingness to work on the EPS board. We are presenting them here (in alphabetical order by surname).

We will be updating this list in the days before the GA. Please send in any nominations or self-nominations to board@europython.eu.

Please note that our bylaws do not restrict nominations to people on this list. It is even possible to self-nominate or nominate other candidates at the GA itself. However, in the interest of giving members a better chance to review the candidate list, we’d like to encourage all nominations to be made before the GA.

The following fine folks have expressed their desire to run as a group for the next EPS board elections: Patrick Arminio, Prof. Martin Christen, Nicolás Demarchi, Raquel Dou, Anders Hammarquist, Cheukting Ho, Francesco Pierfederici and Silvia Uberti. They have a good track record of working well together and share a common vision for the betterment of the EPS via strengthening ties to the larger Python community. They will present their plans and vision during the GA.

Marc-André Lemburg, who has been serving on the EPS board since 2012 and as its chair since 2017 is not running for the board again, as already announced in last year’s GA. He’s looking forward to enjoying the conference from the attendee perspective again.

Patrick Arminio

Software Developer / Python Italia Chair

Patrick started his journey with the Python community at PyCon Italia 2 in 2008. After many years of helping run PyCon Italia (and other conferences) as a volunteer he became the Chair of Python Italia in 2017.

He has also been nominated as a PSF Fellow for his contribution to conferences and also open source software.

He currently works as a Software Engineer in London.

Prof. Martin Christen

Teaching Python / using Python for research projects

Martin Christen is a professor of Geoinformatics and Computer Graphics at the Institute of Geomatics at the University of Applied Sciences Northwestern Switzerland (FHNW). His main research interests are geospatial Virtual- and Augmented Reality, 3D geoinformation, and interactive 3D maps.

Martin is very active in the Python community. He teaches various Python-related courses and uses Python in many research projects. He organizes the PyBasel meetup - the local Python User Group Northwestern Switzerland. He also organizes the yearly GeoPython conference. He is also a board member of the Python Software Verband e.V.

I would be glad to help with EuroPython, to be part of a great team that makes the next edition of EuroPython even better. I’m looking forward to a great physical/hybrid conference in Dublin.

Artur Czepiel

Software developer

Artur started writing in Python around 2008, originally using it mostly to implement backends for websites and later expanding to other areas.

He joined the EuroPython team in 2017 after watching a talk about the state of the conference software at the time. Then contributed patches, joined various work groups and helped with EuroPython 2018 and 2019 editions.

For the 2019 conference, he also joined the board where he helped with due diligence in the RFP process, but kept the focus on web/infra including major updates to the website software, and supporting other workgroups. Other than Europython, he's co-organising two local Python meetups in Kraków, Poland where he's based. He was also part of the team behind Remote Python Pizza, and provided minor software updates to other conferences.

I would like to join the EPS board again, with my main focus being (again) infrastructure. I've learned a lot, both during the 2019 term and since then, and I believe that for the 2022 conference we can improve the conference setup even more. My main focus will be the ease of organisation and ultimately making the conference experience better for everyone.

Nicolás Demarchi

Pythonista / Software Engineer

Nicolás is a self-taught software engineer working professionally for more than 15 years. After participating on some Linux User Groups and the Mozilla community, Nicolás joined the Python community around 2012, fell in love with it and never left. He is a founder and has been a board member of the Python Argentina NGO since 2016. In the PyAr community, as an organizer, he participated several events and conferences as organizer and/or speaker, ranging from Python Days in various cities, PyCamp and the Python Argentina national conference, being a core organizer in the 2018 one in Buenos Aires (an open and free conference with ~1500 attendees). Since 2014 Nicolás has been maintaining the Python Argentina infrastructure that supports the mailing list, webpages, etc.  He was (still helping a bit) the host of the Buenos Aires Python Meetup. In June 2019, Nicolás moved to Amsterdam and he is currently living and working there.  A few months after the move, he joined the organization of the Python Amsterdam meetup and he is working with a small team to build the local community: py.amsterdam. He also joined the https://pycamp.es/ team trying to replicate the Pycamp Argentina experience in Europe. In 2020 he volunteered in the Media Workgroup of Europython 2020 online as a core organizer.
He joined the EPS board in 2021 and helped to organize Europython2021-online.

I would like to continue in the EPS board because I think Europython is the event connecting all European communities and therefore the right place to invest my time. In addition, I believe I can learn a lot as a volunteer. For 2022 I want to work for the whole European Python community to have a better relation with EPS and to work on other smaller/local events apart from EuroPython.

Raquel Dou

Linguist / Python enthusiast

Raquel first met Python in 2013, during her MSc studies in Evolution of Language and Cognition, where she used Python to model the evolution of a simple communication system over time. She operates a small business providing language and technical services, in which Python is one of her primary tools.

She first attended EuroPython when it took place at her doorstep (Edinburgh) in 2018, and was an onsite volunteer. Since then she remained closely engaged in the EPS, as well as the organisation and execution of the conferences. She has been serving on the EPS board since 2019, working closely with the brilliant teams which delivered the two recent EuroPython Online editions. In 2021, besides leading the two amazing Sponsor and Support teams, she was also heavily involved in the conference lineup and speaker management.

During these 3 years, she has experienced warmth, openness, creativity and desire to do good in every aspect of her engagement with the Python community which proudly serves. For the next edition, she hopes to finally meet in Dublin every one of the volunteers she has worked so closely with for years. She shares a community building vision with the amazing EuroPython team and would love to continue in this exhilarating journey with them to take the Society to new heights.

Anders Hammarquist

Pythonista / Consultant / Software architect

Anders is running his own Python consultancy business, AB Struse, since 2019 and is currently mostly involved with using Python in industrial automation. He has been using Python since 1995, and fosters its use in at least four companies.

He helped organize EuroPython 2004 and 2005, and has attended and given talks at several EuroPythons since then. He has handled the Swedish financials of the EuroPython Society since 2016 and has served as board member since 2017.

Cheuk Ting Ho

Pythonista / Developer Advocate / Data Scientist

After spending 5 years researching theoretical physics at Hong Kong University of Science and Technology, Cheuk has transferred her analytical and logical skills in natural science and built a career in data science. Cheuk has been a Data Scientist before working in a team of developers building a revolutionary graph database.

Cheuk constantly contributes to the community by giving AI and deep learning workshops and organizing sprints for open source projects, at the same time contributing to open source projects including Pandas, Keras, Scikit-learn, Dateutil and maintaining open-source libraries. On top of speaking at conferences, Cheuk has joined the organizing team of EuroPython as a member of the programming workgroup since 2019 and was hosting the lightning talk in the same year.

Last year, Cheuk joined the EuroPython Society Board and was nominated to be the Python Software Foundation fellow. Cheuk has been leading the Financial Aid team to provide accessible tickets for people around the world to join EuroPython. Cheuk has also started a speaker mentorship program and organize workshops for new speakers. Believing in the benefit the society has with diversity and inclusion, Cheuk would like to continue bringing new faces to the society and keep connecting people in it. In 2021, hopefully we will meet again in person and Cheuk would like to make sure that EuroPython is accessible to everyone both online and in-person.

Francesco Pierfederici

Pythonic Beer Brewer and Drinker

The year 2000 was the year that Python saved Francesco from Perl and Java. He worked on a large variety of projects over the last 20 odd years all involving Python, mostly in scientific environments.

He is currently driving a 30m telescope with Python at IRAM (https://www.iram-institute.org). In his free time, he is trying to optimise his beer brewing with a host of sensors and micro controllers running MicroPython and still failing at that.

He has been volunteering with and in the EuroPython community since the conference in Rimini in 2017 and has helped with the website since 2019.

Why serve on the EPS board for a second year? Francesco loves the EuroPython conference and its volunteers, organisers and attendees. He would love the opportunity to give back to this community and to the Python community in general. A community that has given him so much.

Silvia Uberti

Sysadmin / IT Consultant

She is a Sysadmin with a degree in Network Security, really passionate about technology, traveling and her piano.

She’s an advocate for women in STEM disciplines and supports inclusiveness of underrepresented people in tech communities.

She fell in love with Python and its warm community during PyCon Italia in 2014 and became a member of EuroPython Sponsor Workgroup in 2017.

She enjoys working in it a lot and wants to help more!

Chair / Vice-Chair Nominations

Raquel Dou is running for the chair position and Artur Czepiel for the vice chair position.

What does the EPS Board do ?

The EPS board runs the day-to-day business of the EuroPython Society, including running the EuroPython conference events. It is allowed to enter contracts for the society and handle any issues that have not been otherwise regulated in the bylaws or by the General Assembly. Most business is handled on the board’s Telegram group or by email on the board mailing list. Board meetings are usually run as conference calls.

It is important to note that the EPS board is an active board, i.e. the board members are expected to put in a significant amount of time and effort towards the goals of the EPS and for running the EuroPython conference. This usually means 200+ hours work over a period of one year, with most of this being needed in the last six months before the conference. Many board members put in even more work to make sure that the EuroPython conferences become a success.

Board members are generally expected to take on leadership roles within the EuroPython Workgroups in order to facilitate good communication and quick decision making. They should be passionate about EuroPython, the Python community and working in a team of volunteers.

Enjoy,
EuroPython Society
https://www.europython-society.org/

September 23, 2021 11:32 AM UTC


Python Software Foundation

Katia Lira Awarded the PSF Community Service Award for Q2 2020

 


Katia Lira, Software Engineer from Mexico city, has been awarded the Python Software Foundation 2020 Q2 Community Service Award.


RESOLVED, that the Python Software Foundation award the Q2 2020 Community Service Award to Katia Lira for her contributions to PyCon LatAm as conference chair, which held its inaugural conference in 2019. Additionally, Katia is the DEFNA (Django Events Foundation North America) President and has collaborated in crafting the vision of PyLadies Global. She hosts and produces multiple Python/tech/community podcasts like El Dev Show in Spanish. She's a PyCon speaker and is well respected in the community.


We interviewed Katia to learn more about her inspiration and work with the Python community. We also asked two of Katia's associates - Valery Calderon and Cristian Maureira-Fredes to share more light about Katia's impact on the community.

What was your earliest memory of how you got into tech?

Growing up, I wanted to be an Architect. I loved doing the blueprints in AutoCAD. Then I switched to study web design but quickly found it easier to code than to do the UI. It felt effortless to use code to make a button send a form or trigger an animation, so I never looked back.

What was your earlier involvement with the Python community?

One of my best friends invited me to PyCon US in 2016. At the time, I was still learning Python and Django and was unaware of communities and conferences like that.

What inspires you to volunteer your time and resources in the Python Community?

It's always the conversations with people that make me want to continue volunteering and organizing spaces for sharing knowledge and building community, prioritizing Spanish as the language to share and engage.

How has your involvement in the Python community supported your career?

The most important thing is inspiration. Being involved in the community has widened my views on opportunities available to me and also the possibilities for growth. I discovered open source projects that push you to explore new tools and grow skills outside of day-to-day work.

Another thing is just being aware of the conversations around hiring and work. Especially when people are open about how to prepare for a technical interview and how they struggle with growing into a more senior role.

How has Covid affected your work with the Python community and what steps are you taking to push the community forward during these trying times?

I think we all tried to compensate with remote by joining as many virtual spaces as possible and it has been taxing on many of us. 

For the two conferences, I volunteer at - PyCon LatAm and DjangoCon US, we took 2020 off from having any events. That helped with not burning out our volunteers and organizers. And it made us ready for 2021 which has been easier because we had fewer uncertainties, we jumped in, knowing it was going to be fully remote and that the community members missed each other.

Katia Lira's Impact on the Python community


Cristián Maureira-Fredes, Software Engineer and R & D manager, speaks on Katia's contributions to the PyCon LatAm community and the larger Python community:
Katia has been a fundamental person in the whole PyCon LatAm initiative, being able to unify the many Latin American Python communities under the same umbrella. Hosting a LatAm conference seemed quite impossible if you ask me, but together with a wonderful team, they proved me wrong.

The enthusiasm and motivation I felt as a South American made me push forward the Python groups in my own country, from where we decided to organize our first small conference to a PyDay event. I asked Katia to be a keynote speaker and she agreed without any hesitation. 

Thanks to her talk, many people felt that being from LatAm was not a barrier to push for global communities and that people were responsible to make initiatives like PyLadies and PyCon LatAm as successful as they are.

Katia's keynote also motivated a lot of people from Chile. And this has led to an increase in community activities, beginning with new initiatives like the first PyLadies chapter - PyLadies Santiago.

And Katia's impact span beyond the LatAm region, to the global Spanish-speaking community.
Valery Calderon, Data Engineer speaks also on Katia's impact on the LatAm community:

Katia spear-headed the PyCon LatAm initiative. She is always open to helping people in the community by mentoring, giving talks, volunteering, and helping to organize events. She has also been of tremendous support to the past and present initiatives of creating room for diverse people within the Latin American community in the PSF.

Katia specifically helped me to propose my candidacy for the PyLadies Global Council.

In Latin America, there is a big gap in the culture of communities. And to make it better and inclusive, there's a lot of work that has been done and still needs to be done. Katia is helping to bridge this gap, which is a huge impact on the LatAm Python community.

We at the Python Software Foundation wish to once again congratulate and celebrate Katia Lira for her tremendous impact in the Python LatAm community, PyCon LatAm, and the wider Python community.

September 23, 2021 04:42 AM UTC


Michał Bultrowicz

Setting up and syncing config on two laptops

I’ve created a script that should 1 set up a fresh Manjaro with all the software and configuration that I want in a workstation. It can also update the setup on being rerun. Now I have two laptops that behave and look the same 2, and it’s easy for me to maintain that state. Oh I wanted that for a long time :)

September 23, 2021 12:00 AM UTC

September 22, 2021


Python for Beginners

Difference between yield and return in Python

You might have used yield and return statements while programming in python. In this article, we will discuss the theoretical concepts of return and yield keywords. We will also look at the difference between working of yield and return statements in python.

What is yield and return in Python?

Yield and return are keywords in python. They are used in a function to pass values from one function to another in a program.

The return keyword

The return statements are used in a function to return objects to the caller function. We can return a single value like a number or string or a container object such as a python dictionary, a tuple, or a list. 

For example, the sumOfNums()  function returns a number to the caller in the following source code.

def sumOfNums(num1, num2):
    result = num2 + num1
    return result


output = sumOfNums(10, 20)
print("Sum of 10 and 20 is:", output)

Output:

Sum of 10 and 20 is: 30

Similarly, we can use return statements to return container objects as shown in the following example. Here the function “square” takes a list of numbers as input and returns a list of squares of the elements of the input list.

def square(list1):
    newList = list()
    for i in list1:
        newList.append(i * i)
    return newList


input_list = [1, 2, 3, 4, 5, 6]
print("input list is:", input_list)
output = square(input_list)
print("Output list is:", output)

Output:

input list is: [1, 2, 3, 4, 5, 6]
Output list is: [1, 4, 9, 16, 25, 36]

We can have more than one return statement in a function. However, once a return statement is executed in the program, the statements written after the return statement will never be executed. 

The yield keyword 

The yield statements are also used in a function to return values to the caller function. But the yield statement works in a different way. When the yield statement is executed in a function, it returns a generator object to the caller. The value in the generator object can be accessed using the next() function or a for loop as follows.

def square(list1):
    newList = list()
    for i in list1:
        newList.append(i * i)
    yield newList


input_list = [1, 2, 3, 4, 5, 6]
print("input list is:", input_list)
output = square(input_list)
print("Output from the generator is:", output)
print("Elements in the generator are:",next(output))

Output:

input list is: [1, 2, 3, 4, 5, 6]
Output from the generator is: <generator object square at 0x7fa59b674a50>
Elements in the generator are: [1, 4, 9, 16, 25, 36]

A function can have more than one yield statement. When the first yield statement is executed, it pauses the execution of the function and returns a generator to the caller function. When we perform the next operation on a generator using the next() function, the function again resumes and executes till the next yield statement. This process can be continued till the last statement of the function. You can understand this using the following example.

def square(list1):
    yield list1[0]**2
    yield list1[1] ** 2
    yield list1[2] ** 2
    yield list1[3] ** 2
    yield list1[4] ** 2
    yield list1[5] ** 2


input_list = [1, 2, 3, 4, 5, 6]
print("input list is:", input_list)
output = square(input_list)
print("Output from the generator is:", output)
print("Elements in the generator are:")
for i in output:
    print(i)

Output:

input list is: [1, 2, 3, 4, 5, 6]
Output from the generator is: <generator object square at 0x7fa421848a50>
Elements in the generator are:
1
4
9
16
25
36

You should keep in mind that when the generator is passed to the next() function after execution of the last yield statement, it causes StopIteration error. It can be avoided by using the next() function in a python try except block.

Difference between yield and return 

There are two major differences between the working of yield and return statements in python.

Conclusion

In this article, we have studied yield and return statements in Python. We also looked at the difference between yield and return statement. To learn more about python programming, you can read this article on list comprehension. You may also like this article on the linked list in Python.

The post Difference between yield and return in Python appeared first on PythonForBeginners.com.

September 22, 2021 02:31 PM UTC


Real Python

The Django Template Language: Tags and Filters

Django is a powerful framework for creating web applications in Python. Its features include database models, routing URLs, authentication, user management, administrative tools, and a template language. You can compose reusable HTML that changes based on the data you pass to the template language. Django templates use tags and filters to define a mini-language that’s similar to Python—but isn’t Python.

You’ll get to know Django templates through the tags and filters you use to compose reusable HTML.

In this tutorial, you’ll learn how to:

  • Write, compile, and render a Django template
  • Use the render() shortcut in views to quickly use templates
  • Use template tags for conditionals and loops in your templates
  • Create reusable templates with inheritance and inclusion
  • Modify the presentation of your data through template filters

Free Bonus: Click here to get access to a free Django Learning Resources Guide (PDF) that shows you tips and tricks as well as common pitfalls to avoid when building Python + Django web applications.

Creating a Django Project

To experiment with Django templates, you’re going to need a project so that you can play around with the code. You’ll be building MoviePalace: the world’s smallest, simplest movie website. For a more detailed example of starting a new project, you can read Get Started With Django Part 1: Build a Portfolio App.

Django isn’t part of the standard Python library, so you’ll first need to install it. When dealing with third-party libraries, you should use a virtual environment. For a refresher on virtual environments, you can read over Python Virtual Environments: A Primer.

Once you have a virtual environment, run the following commands to get going:

 1$ python -m pip install django==3.2.5
 2$ django-admin startproject MoviePalace
 3$ cd MoviePalace
 4$ python manage.py startapp core

Line 1 installs Django into your virtual environment using pip. On line 2, the django-admin command creates a new Django project called MoviePalace. A Django project is comprised of apps, where your code lives. The fourth command creates an app named core.

You’re almost ready to go. The last step is to tell Django about your newly created core app. You do this by editing the MoviePalace/settings.py file and adding "core" to the list of INSTALLED_APPS:

33INSTALLED_APPS = [
34    "django.contrib.admin",
35    "django.contrib.auth",
36    "django.contrib.contenttypes",
37    "django.contrib.sessions",
38    "django.contrib.messages",
39    "django.contrib.staticfiles",
40    "core",
41]

With core registered as an app, you can now write a view containing a template.

Getting Ready to Use Django Templates

Django was created at a newspaper to help build web applications quickly. One of the goals of the framework was to separate the concerns of the business logic from the presentation logic.

Web designers, rather than Python programmers, frequently did the HTML development at the paper. Because of this, the developers decided not to allow the execution of Python within the template language. This decision simplified what the designers needed to know and sandboxed their code for security reasons. The end result was a separate mini-language. This approach is in contrast to the PHP approach, where the code is directly embedded in the HTML.

Compiling and Rendering Django Templates

Django templates let you dynamically change output content within a rendering context. You can think of templates as a form letter, where the letter’s contents include places where information can be inserted. You can run the rendering process multiple times with different data and get different results each time.

Django provides the Template and Context classes to represent the string template being rendered and the data being used during generation. The Context class is a wrapper to a dict and provides key-value pairs to populate the generated content. The result of a rendered template can be any text but is frequently HTML. Django is a web framework, after all.

It’s time to build your first template. To see one in action, you’ll first need a view. Add the following code to core/views.py:

 1# core/views.py
 2from django.http import HttpResponse
 3from django.template import Context, Template
 4
 5def citizen_kane(request):
 6    content = """{{movie}} was released in {{year}}"""
 7    template = Template(content)
 8    context = Context({"movie": "Citizen Kane", "year": 1941})
 9
10    result = template.render(context)
11    return HttpResponse(result)

In this view, you see some of the main concepts that make up the Django templating language:

  • Line 6 contains references to movie and year. This is similar to a Python f-string. The double braces, or mustache brackets, indicate the items that Django replaces when it renders the template.
  • Line 7 instantiates a Template object by passing in the string that specifies the template.
  • Line 8 creates a Context object by populating it with a dictionary. The Context object contains all of the data available to the template when Django renders it. The template contains two items to replace: {{movie}} with "Citizen Kane" and {{year}} with 1941.
  • Line 10 has the call to the .render() method that generates the result.
  • Line 11 returns the rendered content wrapped in an HttpResponse object.

To test this out, you’ll need to make this view available in the browser, so you’ll need to add a route. Modify MoviePalace/urls.py as follows:

Read the full article at https://realpython.com/django-templates-tags-filters/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

September 22, 2021 02:00 PM UTC


Mike Driscoll

Creating an MP3 Tagger GUI with wxPython

I don’t know about you, but I enjoy listening to music. As an avid music fan, I also like to rip my CDs to MP3 so I can listen to my music on the go a bit easier. There is still a lot of music that is unavailable to buy digitally. Unfortunately, when you rip a lot of music, you will sometimes end up with errors in the MP3 tags. Usually, there is a mis-spelling in a title or a track isn’t tagged with the right artist. While you can use many open source and paid programs to tag MP3 files, it’s also fun to write your own.

That is the topic of this article. In this article, you will write a simple MP3 tagging application. This application will allow you to view an MP3 file’s current tags as well as edit the following tags:

The first step in your adventure is finding the right Python package for the job!

 

Finding an MP3 Package

There are several Python packages that you can use for editing MP3 tags. Here are a few that I found when I did a Google search:

You will be using eyeD3 for this chapter. It has a nice API that is fairly straight-forward. Frankly, I found most of the APIs for these packages to be brief and not all that helpful. However, eyeD3 seemed a bit more natural in the way it worked than the others that I tried, which is why it was chosen.

By the way, the package name, eyeD3, refers to the ID3 specification for metadata related to MP3 files.

However, the mutagen package is definitely a good fallback option because it supports many other types of audio metadata. If you happen to be working with other audio file types besides MP3, then you should definitely give mutagen a try.

 

Installing eyeD3

The eyeD3 package can be installed with pip. If you have been using a virtual environment (venv or virtualenv) for this book, make sure you have it activated before you install eyeD3:

python3 -m pip install eyeD3

Once you have eyeD3 installed, you might want to check out its documentation:

Now let’s get started and make a neat application!

 

Designing the MP3 Tagger

Your first step is to figure out what you want the user interface to look like. You will need the following features to make a useful application:

Here is a simple mockup of what the main interface might look like:

MP3 Tagger GUI MockupMP3 Tagger GUI Mockup

 

This user interface doesn’t show how to actually edit the MP3, but it implies that the user would need to press the button at the bottom to start editing. This seems like a reasonable way to start.

Let’s code the main interface first!

 

Creating the Main Application

Now comes the fun part, which is writing the actual application. You will be using ObjectListView again for this example for displaying the MP3’s metadata. Technically you could use one of wxPython’s list control widgets. If you’d like a challenge, you should try changing the code in this chapter to using one of those.

Note: The code for this article can be downloaded on GitHub

Anyway, you can start by creating a file named main.py and entering the following:

# main.py

import eyed3
import editor
import glob
import wx

from ObjectListView import ObjectListView, ColumnDefn

class Mp3:

    def __init__(self, id3):
        self.artist = ''
        self.album = ''
        self.title = ''
        self.year = ''

        # Attempt to extract MP3 tags
        if not isinstance(id3.tag, type(None)):
            id3.tag.artist = self.normalize_mp3(
                id3.tag.artist)
            self.artist = id3.tag.artist
            id3.tag.album = self.normalize_mp3(
                id3.tag.album)
            self.album = id3.tag.album
            id3.tag.title = self.normalize_mp3(
                id3.tag.title)
            self.title = id3.tag.title
            if hasattr(id3.tag, 'best_release_date'):
                if not isinstance(
                    id3.tag.best_release_date, type(None)):
                    self.year = self.normalize_mp3(
                        id3.tag.best_release_date.year)
                else:
                    id3.tag.release_date = 2019
                    self.year = self.normalize_mp3(
                        id3.tag.best_release_date.year)
        else:
            tag = id3.initTag()
            tag.release_date = 2019
            tag.artist = 'Unknown'
            tag.album = 'Unknown'
            tag.title = 'Unknown'
        self.id3 = id3
        self.update()

Here you have the imports you need. You also created a class called Mp3 which will be used by the ObjectListView widget. The first four instance attributes in this class are the metadata that will be displayed in your application and are defaulted to strings. The last instance attribute, id3, will be the object returned from eyed3 when you load an MP3 file into it.

Not all MP3s are created equal. Some have no tags whatsoever and others may have only partial tags. Because of those issues, you will check to see if id3.tag exists. If it does not, then the MP3 has no tags and you will need to call id3.initTag() to add blank tags to it. If id3.tag does exist, then you will want to make sure that the tags you are interested in also exist. That is what the first part of the if statement attempts to do when it calls the normalize_mp3() function.

The other item here is that if there are no dates set, then the best_release_date attribute will return None. So you need to check that and set it to some default if it happens to be None.

Let’s go ahead and create the normalize_mp3() method now:

def normalize_mp3(self, tag):
    try:
        if tag:
            return tag
        else:
            return 'Unknown'
    except:
        return 'Unknown'

This will check to see if the specified tag exists. If it does, it simply returns the tag’s value. If it does not, then it returns the string: ‘Unknown’

The last method you need to implement in the Mp3 class is update():

def update(self):
    self.artist = self.id3.tag.artist
    self.album = self.id3.tag.album
    self.title = self.id3.tag.title
    self.year = self.id3.tag.best_release_date.year

This method is called at the end of the outer else in the class’s __init__() method. It is used to update the instance attributes after you have initialized the tags for the MP3 file.

There may be some edge cases that this method and the __init__() method will not catch. You are encouraged to enhance this code yourself to see if you can figure out how to fix those kinds of issues.

Now let’s go ahead and create a subclass of wx.Panel called TaggerPanel:

class TaggerPanel(wx.Panel):

    def __init__(self, parent):
        super().__init__(parent)
        self.mp3s = []
        main_sizer = wx.BoxSizer(wx.VERTICAL)

        self.mp3_olv = ObjectListView(
            self, style=wx.LC_REPORT | wx.SUNKEN_BORDER)
        self.mp3_olv.SetEmptyListMsg("No Mp3s Found")
        self.update_mp3_info()
        main_sizer.Add(self.mp3_olv, 1, wx.ALL | wx.EXPAND, 5)

        edit_btn = wx.Button(self, label='Edit Mp3')
        edit_btn.Bind(wx.EVT_BUTTON, self.edit_mp3)
        main_sizer.Add(edit_btn, 0, wx.ALL | wx.CENTER, 5)

        self.SetSizer(main_sizer)

The TaggerPanel is nice and short. Here you set up an instance attribute called mp3s that is initialized as an empty list. This list will eventually hold a list of instances of your Mp3 class. You also create your ObjectListView instance here and add a button for editing MP3 files.

Speaking of editing, let’s create the event handler for editing MP3s:

def edit_mp3(self, event):
    selection = self.mp3_olv.GetSelectedObject()
    if selection:
        with editor.Mp3TagEditorDialog(selection) as dlg:
            dlg.ShowModal()
            self.update_mp3_info()

Here you will use the GetSelectedObject() method from the ObjectListView widget to get the selected MP3 that you want to edit. Then you make sure that you got a valid selection and open up an editor dialog which is contained in your editor module that you will write soon. The dialog accepts a single argument, the eyed3 object, which you are calling selection here.

Note that you will need to call update_mp3_info() to apply any updates you made to the MP3’s tags in the editor dialog.

Now let’s learn how to load a folder that contains MP3 files:

def load_mp3s(self, path):
    if self.mp3s:
        # clear the current contents
        self.mp3s = []
    mp3_paths = glob.glob(path + '/*.mp3')
    for mp3_path in mp3_paths:
        id3 = eyed3.load(mp3_path)
        mp3_obj = Mp3(id3)
        self.mp3s.append(mp3_obj)
    self.update_mp3_info()

In this example, you take in a folder path and use Python’s glob module to search it for MP3 files. Assuming that you find the files, you then loop over the results and load them into eyed3. Then you create an instance of your Mp3 class so that you can show the user the MP3’s metadata. To do that, you call the update_mp3_info() method. The if statement at the beginning of the method is there to clear out the mp3s list so that you do not keep appending to it indefinitely.

Let’s go ahead and create the update_mp3_info() method now:

def update_mp3_info(self):
    self.mp3_olv.SetColumns([
        ColumnDefn("Artist", "left", 100, "artist"),
        ColumnDefn("Album", "left", 100, "album"),
        ColumnDefn("Title", "left", 150, "title"),
        ColumnDefn("Year", "left", 100, "year")
    ])
    self.mp3_olv.SetObjects(self.mp3s)

The update_mp3_info() method is used for displaying MP3 metadata to the user. In this case, you will be showing the user the Artist, Album title, Track name (title) and the Year the song was released. To actually update the widget, you call the SetObjects() method at the end.

Now let’s move on and create the TaggerFrame class:

class TaggerFrame(wx.Frame):

    def __init__(self):
        super().__init__(
            None, title="Serpent - MP3 Editor")
        self.panel = TaggerPanel(self)
        self.create_menu()
        self.Show()

Here you create an instance of the aforementioned TaggerPanel class, create a menu and show the frame to the user. This is also where you would set the initial size of the application and the title of the application. Just for fun, I am calling it Serpent, but you can name the application whatever you want to.

Let’s learn how to create the menu next:

def create_menu(self):
    menu_bar = wx.MenuBar()
    file_menu = wx.Menu()
    open_folder_menu_item = file_menu.Append(
        wx.ID_ANY, 'Open Mp3 Folder', 'Open a folder with MP3s'
    )
    menu_bar.Append(file_menu, '&File')
    self.Bind(wx.EVT_MENU, self.on_open_folder,
              open_folder_menu_item)
    self.SetMenuBar(menu_bar)

In this small piece of code, you create a menubar object. Then you create the file menu with a single menu item that you will use to open a folder on your computer. This menu item is bound to an event handler called on_open_folder(). To show the menu to the user, you will need to call the frame’s SetMenuBar() method.

The last piece of the puzzle is to create the on_open_folder() event handler:

def on_open_folder(self, event):
    with wx.DirDialog(self, "Choose a directory:",
                      style=wx.DD_DEFAULT_STYLE,
                      ) as dlg:
        if dlg.ShowModal() == wx.ID_OK:
            self.panel.load_mp3s(dlg.GetPath())

You will want to open a wx.DirDialog here using Python’s with statement and show it modally to the user. This prevents the user from interacting with your application while they choose a folder. If the user presses the OK button, you will call the panel instance’s load_mp3s() method with the path that they have chosen.

For completeness, here is how you will run the application:

if __name__ == '__main__':
    app = wx.App(False)
    frame = TaggerFrame()
    app.MainLoop()

You are always required to create a wx.App instance so that your application can respond to events.

Your application won’t run yet as you haven’t created the editor module yet.

Let’s learn how to do that next!

Editing MP3s

Editing MP3s is the point of this application, so you definitely need to have a way to accomplish that. You could modify the ObjectListView widget so that you can edit the data there or you can open up a dialog with editable fields. Both are valid approaches. For this version of the application, you will be doing the latter.

Let’s get started by creating the Mp3TagEditorDialog class:

# editor.py

import wx

class Mp3TagEditorDialog(wx.Dialog):

    def __init__(self, mp3):
        title = f'Editing "{mp3.id3.tag.title}"'
        super().__init__(parent=None, title=title)

        self.mp3 = mp3
        self.create_ui()

Here you instantiate your class and grab the MP3’s title from its tag to make the title of the dialog refer to which MP3 you are editing. Then you set an instance attribute and call the create_ui() method to create the dialog’s user interface.

Let’s create the dialog’s UI now:

def create_ui(self):
    self.main_sizer = wx.BoxSizer(wx.VERTICAL)

    size = (200, -1)
    track_num = str(self.mp3.id3.tag.track_num[0])
    year = str(self.mp3.id3.tag.best_release_date.year)

    self.track_number = wx.TextCtrl(
        self, value=track_num, size=size)
    self.create_row('Track Number', self.track_number)

    self.artist = wx.TextCtrl(self, value=self.mp3.id3.tag.artist,
                              size=size)
    self.create_row('Artist', self.artist)

    self.album = wx.TextCtrl(self, value=self.mp3.id3.tag.album,
                             size=size)
    self.create_row('Album', self.album)

    self.title = wx.TextCtrl(self, value=self.mp3.id3.tag.title,
                             size=size)
    self.create_row('Title', self.title)

    btn_sizer = wx.BoxSizer()
    save_btn = wx.Button(self, label="Save")
    save_btn.Bind(wx.EVT_BUTTON, self.save)

    btn_sizer.Add(save_btn, 0, wx.ALL, 5)
    btn_sizer.Add(wx.Button(self, id=wx.ID_CANCEL), 0, wx.ALL, 5)
    self.main_sizer.Add(btn_sizer, 0, wx.CENTER)

    self.SetSizerAndFit(self.main_sizer)

Here you create a series of wx.TextCtrl widgets that you pass to a function called create_row(). You also add the “Save” button at the end and bind it to the save() event handler. Finally you add a “Cancel” button. The way you create the Cancel button is kind of unique. All you need to do is pass wx.Button a special id: wx.ID_CANCEL. This will add the right label to the button and automatically make it close the dialog for you without actually binding it to a function.

This is one of the convenience functions built-in to the wxPython toolkit. As long as you don’t need to do anything special, this functionality is great.

Now let’s learn what to put into the create_row() method:

def create_row(self, label, text):
    sizer = wx.BoxSizer(wx.HORIZONTAL)
    row_label = wx.StaticText(self, label=label, size=(50, -1))
    widgets = [(row_label, 0, wx.ALL, 5),
               (text, 0, wx.ALL, 5)]
    sizer.AddMany(widgets)
    self.main_sizer.Add(sizer)

In this example, you create a horizontal sizer and an instance of wx.StaticText with the label that you passed in. Then you add both of these widgets to a list of tuples where each tuple contains the arguments you need to pass to the main sizer. This allows you to add multiple widgets to a sizer at once via the AddMany() method.

The last piece of code you need to create is the save() event handler:

def save(self, event):
    current_track_num = self.mp3.id3.tag.track_num
    if current_track_num:
        new_track_num = (int(self.track_number.GetValue()),
                         current_track_num[1])
    else:
        new_track_num = (int(self.track_number.GetValue()), 0)

    artist = self.artist.GetValue()
    album = self.album.GetValue()
    title = self.title.GetValue()

    self.mp3.id3.tag.artist = artist if artist else 'Unknown'
    self.mp3.id3.tag.album = album if album else 'Unknown'
    self.mp3.id3.tag.title = title if title else 'Unknown'
    self.mp3.id3.tag.track_num = new_track_num
    self.mp3.id3.tag.save()
    self.mp3.update()
    self.Close()

Here you check if the track number was set in the MP3’s tag. If it was, then you update it to the new value you set it to. On the other hand, if the track number is not set, then you need to create the tuple yourself. The first number in the tuple is the track number and the second number is the total number of tracks on the album. If the track number is not set, then you can’t know the total number of track reliably programmatically, so you just set it to zero by default.

The rest of the function is setting the various MP3 object’s tag attributes to what is in the dialog’s text controls. Once all the attributes are set, you can call the save() method on the eyed3 MP3 object, tell the Mp3 class instance to update itself and close the dialog. Note that if you try to pass in an empty value for artist, album or title, it will be replaced with the string Unknown.

Now you have all the pieces that you need and you should be able to run the program.

Here is what the main application looked like on my machine:

MP3 Tagger GUIMP3 Tagger GUI

And here is what the editor dialog looked like:

MP3 Editor dialogMP3 Editor dialog

Now let’s learn how to add a few enhancements to your program!

 

Adding New Features

Most applications of this type will allow the user to drag-and-drop files or folders onto them. They also usually have a toolbar for opening folders in addition to the menu. You learned how to do both of these in the previous chapter. You will now add these features to this program as well.

Let’s start by creating our DropTarget class to main.py:

import os

class DropTarget(wx.FileDropTarget):

    def __init__(self, window):
        super().__init__()
        self.window = window

    def OnDropFiles(self, x, y, filenames):
        self.window.update_on_drop(filenames)
        return True

Adding the drag-and-drop feature requires you to sub-class wx.FileDropTarget. You need to pass in the widget that will be the drop target as well. In this case, you want the wx.Panel to be the drop target. Then you override OnDropFiles so that it calls the update_on_drop() method. This is a new method that you will be adding shortly.

But before you do that, you need to update the beginning of your TaggerPanel class:

class TaggerPanel(wx.Panel):

    def __init__(self, parent):
        super().__init__(parent)
        self.mp3s = []
        drop_target = DropTarget(self)
        self.SetDropTarget(drop_target)
        main_sizer = wx.BoxSizer(wx.VERTICAL)      

Here you create an instance of DropTarget and then set the panel as the drop target via the SetDropTarget() method. The benefit of doing this is that now you can drag and drop files or folder pretty much anywhere on your application and it will work.

Note that the above code is not the full code for the __init__() method, but only shows the changes in context. See the source code on Github for the full version.

The first new method to look at is add_mp3():

def add_mp3(self, path):
    id3 = eyed3.load(path)
    mp3_obj = Mp3(id3)
    self.mp3s.append(mp3_obj)

Here you pass in the path of the MP3 file that you want to add to the user interface. It will take that path and load it with eyed3 and add that to your mp3s list.

The edit_mp3() method is unchanged for this version of the application, so it is not reproduced here.

Now let’s move on and create another new method called find_mp3s():

def find_mp3s(self, folder):
    mp3_paths = glob.glob(folder + '/*.mp3')
    for mp3_path in mp3_paths:
        self.add_mp3(mp3_path)

This code and the code in the add_mp3s() method might look a bit familiar to you. It is originally from the load_mp3() method that you created earlier. You are moving this bit of code into its own function. This is known as refactoring your code. There are many reasons to refactor your code. In this case, you are doing so because you will need to call this function from multiple places. Rather than copying this code into multiple functions, it is almost always better to separate it into its own function that you can call.

Now let’s update the load_mp3s() method so that it calls the new one above:

def load_mp3s(self, path):
    if self.mp3s:
        # clear the current contents
        self.mp3s = []
    self.find_mp3s(path)
    self.update_mp3_info()

This method has been reduced to two lines of code. The first calls the find_mp3s() method that you just wrote while the second calls the update_mp3_info(), which will update the user interface (i.e. the ObjectListView widget).

The DropTarget class is calling the update_on_drop() method, so let’s write that now:

def update_on_drop(self, paths):
    for path in paths:
        if os.path.isdir(path):
            self.load_mp3s(path)
        elif os.path.isfile(path):
            self.add_mp3(path)
            self.update_mp3_info()

The update_on_drop() method is the reason you did the refactoring earlier. It also needs to call the load_mp3s(), but only when the path that is passed in is determined to be a directory. Otherwise you check to see if the path is a file and load it up.

But wait! There’s an issue with the code above. Can you tell what it is?

The problem is that when the path is a file, you aren’t checking to see if it is an MP3. If you run this code as is, you will cause an exception to be raised as the eyed3 package will not be able to turn all file types into Mp3 objects.

Let’s fix that issue:

def update_on_drop(self, paths):
    for path in paths:
        _, ext = os.path.splitext(path)
        if os.path.isdir(path):
            self.load_mp3s(path)
        elif os.path.isfile(path) and ext.lower() == '.mp3':
            self.add_mp3(path)
            self.update_mp3_info()

You can use Python’s os module to get the extension of files using the splitext() function. It will return a tuple that contains two items: The path to the file and the extension.

Now that you have the extension, you can check to see if it is .mp3 and only update the UI if it is. By the way, the splitext() function returns an empty string when you pass it a directory path.

The next bit of code that you need to update is the TaggerFrame class so that you can add a toolbar:

class TaggerFrame(wx.Frame):

    def __init__(self):
        super().__init__(
            None, title="Serpent - MP3 Editor")
        self.panel = TaggerPanel(self)
        self.create_menu()
        self.create_tool_bar()
        self.Show()

The only change to the code above is to add a call to the create_tool_bar() method. You will almost always want to create the toolbar in a separate method as there is typically several lines of code per toolbar button. For applications with many buttons in the toolbar, you should probably separate that code out even more and put it into a class or module of its own.

Let’s go ahead and write that method:

def create_tool_bar(self):
    self.toolbar = self.CreateToolBar()
    
    add_folder_ico = wx.ArtProvider.GetBitmap(
        wx.ART_FOLDER_OPEN, wx.ART_TOOLBAR, (16, 16))
    add_folder_tool = self.toolbar.AddTool(
        wx.ID_ANY, 'Add Folder', add_folder_ico,
        'Add a folder to be archived')
    self.Bind(wx.EVT_MENU, self.on_open_folder,
              add_folder_tool)
    self.toolbar.Realize()

To keep things simple, you add a single toolbar button that will open a directory dialog via the on_open_folder() method.

When you run this code, your updated application should now look like this:

MP3 Tagger GUI (empty)MP3 Tagger GUI (empty)

Feel free to add more toolbar buttons, menu items, a status bar or other fun enhancements to this application.

 

Wrapping Up

This article taught you a little about some of Python’s MP3 related packages that you can use to edit MP3 tags as well as other tags for other music file formats. You learned how to create a nice main application that opens an editing dialog. The main application can be used to display relevant MP3 metadata to the user. It also serves to show the user their updates should they decide to edit one or more tags.

The wxPython tookit has support for playing back certain types of audio file formats including MP3. You could create an MP3 player using these capabilities and make this application a part of that.

Download the Source

You can download the source code for the examples in this article on GitHub

Related Articles

Want to learn more about what you can create with wxPython? Check out the following articles:

The post Creating an MP3 Tagger GUI with wxPython appeared first on Mouse Vs Python.

September 22, 2021 12:30 PM UTC


Python Bytes

#251 A 95% complete episode (wait for it)

<p><strong>Watch the live stream:</strong></p> <a href='https://www.youtube.com/watch?v=ssB7SouW1tk' style='font-weight: bold;'>Watch on YouTube</a><br> <br> <p><strong>About the show</strong></p> <p>Sponsored by <strong>us:</strong></p> <ul> <li>Check out the <a href="https://training.talkpython.fm/courses/all"><strong>courses over at Talk Python</strong></a></li> <li>And <a href="https://pythontest.com/pytest-book/"><strong>Brian’s book too</strong></a>!</li> </ul> <p>Special guest: <strong><a href="https://twitter.com/brettsky">Brett Cannon</a></strong></p> <p><strong>Michael #1:</strong> <a href="https://auto-optional.daanluttik.nl/"><strong>auto-optional</strong></a></p> <ul> <li>by Daan Luttik</li> <li>Did you know that concrete types cannot be None in Python typing?</li> <li>This is wrong:</li> </ul> <pre><code> def do_a_thing(extra_info: str = None): ... </code></pre> <ul> <li>auto-optional will fix it:</li> </ul> <pre><code> def do_a_thing(extra_info: Optional[str] = None): ... </code></pre> <ul> <li>Why would you want this?</li> <li>Easily modify external libraries that didn't pay attention to proper use of optional to improve mypy linting.</li> <li>Force consistency in your own code-base: Enforcing that None parameter implies an Optional type.</li> <li>Run via the CLI: <code>auto-optional [path]</code></li> </ul> <p><strong>Brian #2:</strong> <a href="https://daniel.haxx.se/blog/2021/09/04/making-world-class-docs-takes-effort"><strong>Making World-Class Docs Takes Effort</strong></a></p> <ul> <li>Daniel Stenberg</li> <li>Six requirements for a project to get a gold star <ul> <li>docs in the code repo</li> <li>NOT extracted from the code</li> <li>examples, lots of examples, more than you think you need</li> <li>document every API call you provide</li> <li>easily accessible and browsable</li> <li>and hopefully offline readable as well</li> <li>easy to contribute to</li> </ul></li> <li>Non-stop iterating is key to having good docs.</li> <li>extra goodness <ul> <li>consistency for section titles</li> <li>cross-references</li> </ul></li> <li>I’d add <ul> <li>Check for grammar and spelling mistakes</li> <li>Consistency in all things, formatting, style, tone, depth of info of diff topics</li> <li>Don’t be afraid to have a personality. docs that include easter eggs, fun examples, tasteful jokes, etc are nice, as long as that fun stuff doesn’t complicate the docs.</li> <li>Don’t slam projects for having bad docs. Not all open source projects exist for your benefit. <ul> <li>You can make them better by contributing. :)</li> </ul></li> </ul></li> </ul> <p><strong>Brett #3:</strong> <a href="https://starship.rs/"><strong>Starship</strong></a></p> <ul> <li>Continuing the trend of stuff to help make your coding better, Python or not. 😉</li> <li>Also to make Michael’s new love of nerd fonts more useful. 😁</li> <li>And more Rust on this show as Paul Everitt says I must do. 😉</li> <li>Gives you a common shell prompt no matter which shell you use; I also find it easy to set up compared to most shells for their prompts</li> <li>Lots of integrated support for various developer things such as printing what Python version you have when the directory has a <code>pyproject.toml</code> file.</li> <li>Works nicely with the Python Launcher (as I mentioned the last time I was on).</li> <li>Has some pyenv support that I don’t use. 😁</li> </ul> <p><strong>Michael #4:</strong> <a href="https://pypi.org/project/jmespath/"><strong>JMESPath</strong></a></p> <ul> <li>via Josh Thurston</li> <li>Spent tons of time figuring out how to parse the pretty print results that had layers of nested dictionaries and lists. This module saved me time in a big way.</li> <li>JMESPath (pronounced “james path”) allows you to declaratively specify how to extract elements from a JSON document.</li> <li>For example, given this document:</li> </ul> <pre><code> {"foo": {"bar": "baz"}} </code></pre> <ul> <li>The jmespath expression <code>foo.bar</code> will return “baz”.</li> <li>Even works with a projection-like result:</li> </ul> <pre><code> {"foo": {"bar": [{"name": "one"}, {"name": "two"}]}} </code></pre> <ul> <li>The expression: <code>foo.bar[*].name</code> will return <code>["one", "two"]</code>. </li> <li>Negative indexing is also supported (-1 refers to the last element in the list). </li> <li>Given the data above, the expression <code>foo.bar[-1].name</code> will return <code>"two"</code>.</li> </ul> <p><strong>Brian #5:</strong> <a href="https://github.com/spotify/pedalboard"><strong>pedalboard</strong></a> - audio effects library</p> <ul> <li>from Spotify</li> <li>The “power, speed, and sound quality of a DAW”, but in Python.</li> <li><a href="https://engineering.atspotify.com/2021/09/07/introducing-pedalboard-spotifys-audio-effects-library-for-python">Introduction Article</a> (warning: weird color changing header image that is painful to look at, so scroll past that quickly) <ul> <li>Built-in support for a number of basic audio transformations:</li> <li><code>Convolution</code>, <code>Compressor</code>, <code>Chorus</code>, <code>Distortion</code></li> <li><code>Gain</code>, <code>HighpassFilter</code>, <code>LadderFilter</code>, <code>Limiter</code>, <code>LowpassFilter</code></li> <li><code>Phaser</code>, <code>Reverb</code></li> </ul></li> </ul> <p><strong>Brett #6:</strong> <a href="https://www.python.org/dev/peps/pep-0665/"><strong>PEP 665</strong></a> <strong>(and the</strong> <a href="https://discuss.python.org/t/pep-665-specifying-installation-requirements-for-python-projects/9911/152"><strong>journey so far</strong></a><strong>)</strong></p> <ul> <li>Attempt to standardize lock files for Python.</li> <li>Spent six months talking w/ folks privately to come up with the first public draft.</li> <li>Initially a strict lock file, but Poetry and PDM feedback was platform-agnostic was important.</li> <li>Proposal morphed to cover that.</li> <li>Took it public and led to over 150 comments on Discourse.</li> <li>People disliked it: from the title to the explanation to the proposed problem space to the actual solution.</li> <li>Gone back to the drawing board privately w/ one of the original objectors participating; looking like we are reaching a good consensus on how to frame things and how it should ultimately look.</li> <li>(Packaging) PEPs are hard.</li> </ul> <p><strong>Extras</strong></p> <p>Brian</p> <ul> <li>Python is popular, apparently, and <a href="https://www.zdnet.com/article/programming-languages-python-is-on-the-verge-of-another-big-step-forward/">“on the verge of another big step forward”</a> (another good place for dun, dun, duuunnn, ?) <ul> <li>"It only needs to bridge 0.16% to surpass C. This might happen any time now. If Python becomes number 1, a new milestone has been reached in the TIOBE index. Only 2 other languages have ever been leading the pack so far, i.e. C and Java."</li> </ul></li> </ul> <p>Michael</p> <ul> <li><a href="https://www.nerdfonts.com/"><strong>Nerd Fonts</strong></a></li> <li><a href="https://evrone.com/michael-kennedy-interview"><strong>Evrone interview with me</strong></a></li> <li><a href="https://twitter.com/HenrySchreiner3/status/1438227200466202625"><strong>Henry Schreiner’s Fish setup</strong></a></li> <li><a href="https://linuxize.com/post/how-to-create-bash-aliases/"><strong>Aliases rather than CLI/venvs</strong></a></li> </ul> <p>Brett</p> <ul> <li><a href="https://www.youtube.com/watch?v=1kTWxamIJ_k">Will McGugan did a webinar w/ Paul Everitt</a> about Textual (because it’s not a Python Bytes episode if Will’s name is not brought up).</li> <li>Python Launcher officially launched! (Last covered <a href="https://pythonbytes.fm/episodes/show/221/pattern-matching-and-accepting-change-in-python-with-brett-cannon">30 episodes ago</a>.) <ul> <li>Available in AUR, Fedora, and Homeberw (both macOS and Linux).</li> <li>No reported bugs since launch!</li> </ul></li> <li>Still doing my <a href="https://snarky.ca/tag/syntactic-sugar/">syntactic sugar blog posts</a>.</li> <li>The Python extension for VS Code has a <a href="https://code.visualstudio.com/docs/python/testing">refreshed testing UX</a>; we’re coming for you, Brian. 😉</li> </ul> <p><strong>Joke:</strong> <a href="https://geek-and-poke.com/geekandpoke/2021/2/2/the-last-5"><strong>Last 5%</strong></a></p>

September 22, 2021 08:00 AM UTC


Brett Cannon

Unravelling comprehensions

After failing to unravel generator expressions, in this post as part as my Python syntactic sugar post series I want to tackle comprehensions. Thanks to a change made in Python 3.0, recreating comprehensions using generator expressions is straightforward since comprehensions do not leak their loop variant.

List comprehensions

Unravelling [c for b in a] is list(c for b in a): take the body of the list comprehension and pass it to list() as a generator expression.

Set comprehensions

Unravelling {c for b in a} is set(c for b in a). I suspect you&aposre starting to see a pattern...

Dict comprehensions

There&aposs a slight trick to dict comprehensions in that the generator expression needs to return key/value pairs, otherwise it&aposs the same "pass a generator to the type&aposs constructor" trick. That means {c: d for b in a} becomes dict((c, d) for b in a).

Aside: why isn&apost there a tuple comprehension?

There are two reasons why there is no such thing as a "tuple comprehension".

One is how would you represent that syntactically? The obvious syntax of parentheses around a comprehension-like syntax, but that&aposs taken by generator expressions.

The second reason is a "tuple comprehension" is a misuse of tuples. The data structure is meant to provide an index-based struct instead of an immutable sequence. Consider a pair of key and value like we did for  dict comprehensions: (key, value). What&aposs represented by index 0 means something different than what is in index 1.

September 22, 2021 12:10 AM UTC


Python⇒Speed

Scanning your Conda environment for security vulnerabilities

You don’t want to deploy an application that has security vulnerabilities. That means your own code, but also third-party dependencies: it doesn’t matter how secure your code is if it’s exposing a TLS socket with a version of OpenSSL that has a remote code execution vulnerability.

For pip-based Python applications, you’d usually run vulnerability scanners on Python dependencies like Django, and on system packages like OpenSSL. With Conda, however, the situation is a little different: Conda combines both types of packages into one place. In addition, most vulnerability scanners don’t support Conda.

Let’s see what makes Conda different, and how you can scan packages for known vulnerabilities.

Read more...

September 22, 2021 12:00 AM UTC

September 21, 2021


PyCoder’s Weekly

Issue #491 (Sept. 21, 2021)

#491 – SEPTEMBER 21, 2021
View in Browser »

The PyCoder’s Weekly Logo


Structural Pattern Matching in Python 3.10

“Python 3.10, which is due out in early October 2021, will include a large new language feature called structural pattern matching. This article is a critical but (hopefully) informative presentation of the feature, with examples based on real-world code.”
BEN HOYT

Build a Personal Diary With Django and Python

In this beginner-friendly tutorial, you’ll build a personal diary app in Django. You’ll use the strengths of the Django web framework and learn the basics of web development with Python.
REAL PYTHON

Get Deeper Insights Into Your Python Code-Level Performance and Reduce End-User Latency

alt

Datadog APM enables you to detect the methods that consume the most CPU, memory, and time under real work loads allowing you to optimize code in real-time to reduce end-user latency and cloud provider costs. Optimize your Python app performance at any scale. See for yourself with a free trial →
DATADOG sponsor

(Not) Unravelling Generator Expressions

What does that look like if you take away the Python “magic” of generator expressions and unravel them down to their core Python semantics?
BRETT CANNON

Python Anti-Pattern: Using a Mutable Default Value as an Argument

A debugging post-mortem.
VALI VOICU

Better PyPy JIT Support for Auto-Generated Python Code

CARL FRIEDRICH BOLZ-TEREICK

Debugging by Starting a REPL at a Breakpoint Is Fun

JULIA EVANS

Django 4.0 to Get a Built-in Redis Cache Backend

GITHUB.COM/DJANGO

Discussions

quit(), exit(), sys.exit(), os._exit(): The Differences and Do They Matter?

REDDIT

I Think I Have Installed a Dodgy Pip Package…

REDDIT

Python Jobs

Senior Software Engineer (Washington D.C.)

Quorum

Senior Backend Software Engineer (Remote)

Clay

More Python Jobs >>>

Articles & Tutorials

Programming Languages: Python Is on the Verge of Another Big Step Forward

Python could soon take first place in one more programming language popularity ranking: “Python has never been so close to the number 1 position of the TIOBE index,” writes Paul Jansen, chief of Tiobe software. “It only needs to bridge 0.16% to surpass C. This might happen any time now. If Python becomes number 1, a new milestone has been reached in the TIOBE index. Only 2 other languages have ever been leading the pack so far, i.e. C and Java.”
LIAM TUNG

Pass by Reference in Python: Best Practices

In this course, you’ll explore the concept of passing by reference and learn how it relates to Python’s own system for handling function arguments. You’ll look at several use cases for passing by reference and learn some best practices for implementing pass-by-reference constructs in Python.
REAL PYTHON course

Rev APIs Solve All of Your Speech-to-Text Needs

alt

Rev.ai is the most sophisticated automatic speech recognition in the world. Our speech-to-text APIs are more accurate, easier to use, and have less bias than competitors like Google, Amazon, and Microsoft. Try Rev.ai free for five hours right now →
REV.AI sponsor

Using the “and” Boolean Operator in Python

In this step-by-step tutorial, you’ll learn how Python’s “and” operator works and how to use it in your code. You’ll get to know its special features and see what kind of programming problems you can solve by using “and” in Python.
REAL PYTHON

Validating and Formatting Phone Numbers in Python With phonenumbers

This article explains how to validate phonenumbers using the phonenumbers library. It also shows how to extract the meta information (such as carrier, geographical) from a phone number.
RUSLAN HASANOV • Shared by Ruslan Hasanov

How to Send Emails With Python

Learn how to send emails with Python using the smtplib and email modules. You’ll also learn how to send attachments.
MIKE DRISCOLL

DataStax Astra DB – Built on Apache Cassandra. Get 80 Gigabytes of Storage Free Every Month

Need global scale on a startup budget? DataStax Astra DB is a multi-cloud DBaaS built on Apache Cassandra. Painless APIs, always free for developers, and no credit card required.
DATASTAX sponsor

Introduction to Django slippers: Reusable Components

Slippers aims to augment Django’s template language with convenience features for writing reusable components.
MITCHEL CABULOY

How I Patched Python to Include Ruby’s Inline “if”

What the author learned from adding “else-less” functionality to Python, as inspired by Ruby.
MIGUEL BRITO

Language Translation and OCR With Tesseract and Python

ADRIAN ROSEBROCK

Using OpenBSD’s pledge and unveil Syscalls From Python

CHRIS WELLONS

Improving Python Dependency Management With pipx and Poetry

CEDA EI

Projects & Code

eacc: Minimalist but Flexible Lexer/Parser Tool in Python

GITHUB.COM/IOGF

python-goto: Function Decorator That Rewrites the Bytecode to Enable goto in Python

GITHUB.COM/SNOACK

django-cacheops: ORM Cache With Automatic Granular Event-Driven Invalidation

GITHUB.COM/SUOR

django-upgrade: Automatically Upgrade Your Django Projects

GITHUB.COM/ADAMCHAINZ

Learn Python Through Nursery Rhymes & Fairy Tales (Kickstarter)

SHARI ESKENAS

slippers: Build Reusable Components in Django Without Writing a Single Line of Python

GITHUB.COM/MIXXORZ

Events

Real Python Office Hours (Virtual)

September 22, 2021
REALPYTHON.COM

PyDelhi User Group Meetup

September 25, 2021
MEETUP.COM

PythOnRio Meetup

September 25, 2021
PYTHON.ORG.BR

Introduction to the Python Programming Language (In Persian)

September 28, 2021
INSTAGRAM.COM

Dominican Republic Python User Group

September 28 to September 29, 2021
PYTHON.DO


Happy Pythoning!
This was PyCoder’s Weekly Issue #491.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

September 21, 2021 07:30 PM UTC


Django Weblog

Django 4.0 alpha 1 released

Django 4.0 alpha 1 is now available. It represents the first stage in the 4.0 release cycle and is an opportunity for you to try out the changes coming in Django 4.0.

Django 4.0 has a abundance of new features which you can read about in the in-development 4.0 release notes.

This alpha milestone marks the feature freeze. The current release schedule calls for a beta release in about a month and a release candidate about a month from then. We'll only be able to keep this schedule if we get early and often testing from the community. Updates on the release schedule are available on the django-developers mailing list.

As with all alpha and beta packages, this is not for production use. But if you'd like to take some of the new features for a spin, or to help find and fix bugs (which should be reported to the issue tracker), you can grab a copy of the alpha package from our downloads page or on PyPI.

The PGP key ID used for this release is Mariusz Felisiak: 2EF56372BA48CD1B.

September 21, 2021 07:10 PM UTC


Real Python

Pass by Reference in Python: Best Practices

After gaining some familiarity with Python, you may notice cases in which your functions don’t modify arguments in place as you might expect, especially if you’re familiar with other programming languages. Some languages handle function arguments as references to existing variables, which is known as pass by reference. Other languages handle them as independent values, an approach known as pass by value.

If you’re an intermediate Python programmer who wishes to understand Python’s peculiar way of handling function arguments, then this course is for you. You’ll implement real use cases of pass-by-reference constructs in Python and learn several best practices to avoid pitfalls with your function arguments.

In this course, you’ll learn:


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

September 21, 2021 02:00 PM UTC


Mike Driscoll

How to Send Emails with Python

Python provides a couple of really nice modules that you can use to craft emails with. They are the email and smtplib modules. Instead of going over various methods in these two modules, you’ll spend some time learning how to actually use these modules.

Specifically, you’ll be covering the following:

Let’s get started!

Email Basics – How to Send an Email with smtplib

The smtplib module is very intuitive to use. You will write a quick example that shows how to send an email.

Open up your favorite Python IDE or text editor and create a new Python file. Add the following code to a that file and save it:

import smtplib

HOST = "mySMTP.server.com"
SUBJECT = "Test email from Python"
TO = "mike@someAddress.org"
FROM = "python@mydomain.com"
text = "Python 3.4 rules them all!"

BODY = "\r\n".join((
"From: %s" % FROM,
"To: %s" % TO,
"Subject: %s" % SUBJECT ,
"",
text
))

server = smtplib.SMTP(HOST)
server.sendmail(FROM, [TO], BODY)
server.quit()

Here you import only the smtplib module. Two-thirds of this code is used for setting up the email. Most of the variables are pretty self-explanatory, so you’ll focus on the odd one only, which is BODY.

Here you use the string’s join() method to combine all the previous variables into a single string where each line ends with a carriage return (“/r”) plus new line (“/n”). If you print BODY out, it would look like this:

'From: python@mydomain.com\r\nTo: mike@mydomain.com\r\nSubject: Test email from Python\r\n\r\nblah blah blah'

After that, you set up a server connection to your host and then you call the smtplib module’s sendmail method to send the email. Then you disconnect from the server. You will note that this code doesn’t have a username or password in it. If your server requires authentication, then you’ll need to add the following code:

server.login(username, password)

This should be added right after you create the server object. Normally, you would want to put this code into a function and call it with some of these parameters. You might even want to put some of this information into a config file.

Let’s put this code into a function.

import smtplib

def send_email(host, subject, to_addr, from_addr, body_text):
    """
    Send an email
    """
    BODY = "\r\n".join((
            "From: %s" % from_addr,
            "To: %s" % to_addr,
            "Subject: %s" % subject ,
            "",
            body_text
            ))
    server = smtplib.SMTP(host)
    server.sendmail(from_addr, [to_addr], BODY)
    server.quit()

if __name__ == "__main__":
    host = "mySMTP.server.com"
    subject = "Test email from Python"
    to_addr = "mike@someAddress.org"
    from_addr = "python@mydomain.com"
    body_text = "Python rules them all!"
    send_email(host, subject, to_addr, from_addr, body_text)

Now you can see how small the actual code is by just looking at the function itself. That’s 13 lines! And you could make it shorter if you didn’t put every item in the BODY on its own line, but it wouldn’t be as readable. Now you’ll add a config file to hold the server information and the from address.

Why would you do that? Many organizations use different email servers to send email or if the email server gets upgraded and the name changes, then you only need to change the config file rather than the code. The same thing could apply to the from address if your company was bought and merged into another.

Let’s take a look at the config file (save it as email.ini):

[smtp]
server = some.server.com
from_addr = python@mydomain.com

That is a very simple config file. In it, you have a section labeled smtp in which you have two items: server and from_addr. you’ll use ConfigParser to read this file and turn it into a Python dictionary. Here’s the updated version of the code (save it as smtp_config.py)

import os
import smtplib
import sys

from configparser import ConfigParser

def send_email(subject, to_addr, body_text):
    """
    Send an email
    """
    base_path = os.path.dirname(os.path.abspath(__file__))
    config_path = os.path.join(base_path, "email.ini")

    if os.path.exists(config_path):
        cfg = ConfigParser()
        cfg.read(config_path)
    else:
        print("Config not found! Exiting!")
        sys.exit(1)

    host = cfg.get("smtp", "server")
    from_addr = cfg.get("smtp", "from_addr")

    BODY = "\r\n".join((
        "From: %s" % from_addr,
        "To: %s" % to_addr,
        "Subject: %s" % subject ,
        "",
        body_text
    ))
    server = smtplib.SMTP(host)
    server.sendmail(from_addr, [to_addr], BODY)
    server.quit()

if __name__ == "__main__":
    subject = "Test email from Python"
    to_addr = "mike@someAddress.org"
    body_text = "Python rules them all!"
    send_email(subject, to_addr, body_text)

You have added a little check to this code. You want to first grab the path that the script itself is in, which is what base_path represents. Next, you combine that path with the file name to get a fully qualified path to the config file. You then check for the existence of that file.

If it’s there, you create a ConfigParser and if it’s not, you print a message and exit the script. you should add an exception handler around the ConfigParser.read() call just to be on the safe side though as the file could exist, but be corrupt or you might not have permission to open it and that will throw an exception.

That will be a little project that you can attempt on your own. Anyway, let’s say that everything goes you’ll and the ConfigParser object is created successfully. Now you can extract the host and from_addr information using the usual ConfigParser syntax.

Now you’re ready to learn how to send multiple emails at the same time!

Sending Multiple Emails at Once

Being able to send multiple emails at once is a nice feature to have.

Go ahead and modify your last example a little so you can send multiple emails!

import os
import smtplib
import sys

from configparser import ConfigParser

def send_email(subject, body_text, emails):
    """
    Send an email
    """
    base_path = os.path.dirname(os.path.abspath(__file__))
    config_path = os.path.join(base_path, "email.ini")

    if os.path.exists(config_path):
        cfg = ConfigParser()
        cfg.read(config_path)
    else:
        print("Config not found! Exiting!")
        sys.exit(1)

    host = cfg.get("smtp", "server")
    from_addr = cfg.get("smtp", "from_addr")

    BODY = "\r\n".join((
            "From: %s" % from_addr,
            "To: %s" % ', '.join(emails),
            "Subject: %s" % subject ,
            "",
            body_text
            ))
    server = smtplib.SMTP(host)
    server.sendmail(from_addr, emails, BODY)
    server.quit()

if __name__ == "__main__":
    emails = ["mike@someAddress.org", "someone@gmail.com"]
    subject = "Test email from Python"
    body_text = "Python rules them all!"
    send_email(subject, body_text, emails)

You’ll notice that in this example, you removed the to_addr parameter and added an emails parameter, which is a list of email addresses. To make this work, you need to create a comma-separated string in the To: portion of the BODY and also pass the email list to the sendmail method. Thus you do the following to create a simple comma-separated string: ‘, ‘.join(emails). Simple, huh?

Send email using the TO, CC and BCC lines

Now you just need to figure out how to send using the CC and BCC fields.

Let’s create a new version of this code that supports that functionality!

import os
import smtplib
import sys

from configparser import ConfigParser

def send_email(subject, body_text, to_emails, cc_emails, bcc_emails):
    """
    Send an email
    """
    base_path = os.path.dirname(os.path.abspath(__file__))
    config_path = os.path.join(base_path, "email.ini")

    if os.path.exists(config_path):
        cfg = ConfigParser()
        cfg.read(config_path)
    else:
        print("Config not found! Exiting!")
        sys.exit(1)

    host = cfg.get("smtp", "server")
    from_addr = cfg.get("smtp", "from_addr")

    BODY = "\r\n".join((
            "From: %s" % from_addr,
            "To: %s" % ', '.join(to_emails),
            "CC: %s" % ', '.join(cc_emails),
            "BCC: %s" % ', '.join(bcc_emails),
            "Subject: %s" % subject ,
            "",
            body_text
            ))
    emails = to_emails + cc_emails + bcc_emails

    server = smtplib.SMTP(host)
    server.sendmail(from_addr, emails, BODY)
    server.quit()

if __name__ == "__main__":
    emails = ["mike@somewhere.org"]
    cc_emails = ["someone@gmail.com"]
    bcc_emails = ["schmuck@newtel.net"]

    subject = "Test email from Python"
    body_text = "Python rules them all!"
    send_email(subject, body_text, emails, cc_emails, bcc_emails)

In this code, you pass in 3 lists, each with one email address a piece. you create the CC and BCC fields exactly the same as before, but you also need to combine the 3 lists into one so you can pass the combined list to the sendmail() method.

There is some talk on forums like StackOverflow that some email clients may handle the BCC field in odd ways that allow the recipient to see the BCC list via the email headers. I am unable to confirm this behavior, but I do know that Gmail successfully strips the BCC information from the email header.

Now you’re ready to move on to using Python’s email module!

Add an attachment / body using the email module

Now you’ll take what you learned from the previous section and mix it together with the Python email module so that you can send attachments.

The email module makes adding attachments extremely easy. Here’s the code:

import os
import smtplib
import sys

from configparser import ConfigParser
from email import encoders
from email.mime.text import MIMEText
from email.mime.base import MIMEBase
from email.mime.multipart import MIMEMultipart
from email.utils import formatdate

def send_email_with_attachment(subject, body_text, to_emails,
                               cc_emails, bcc_emails, file_to_attach):
    """
    Send an email with an attachment
    """
    base_path = os.path.dirname(os.path.abspath(__file__))
    config_path = os.path.join(base_path, "email.ini")
    header = 'Content-Disposition', 'attachment; filename="%s"' % file_to_attach

    # get the config
    if os.path.exists(config_path):
        cfg = ConfigParser()
        cfg.read(config_path)
    else:
        print("Config not found! Exiting!")
        sys.exit(1)

    # extract server and from_addr from config
    host = cfg.get("smtp", "server")
    from_addr = cfg.get("smtp", "from_addr")

    # create the message
    msg = MIMEMultipart()
    msg["From"] = from_addr
    msg["Subject"] = subject
    msg["Date"] = formatdate(localtime=True)
    if body_text:
        msg.attach( MIMEText(body_text) )

    msg["To"] = ', '.join(to_emails)
    msg["cc"] = ', '.join(cc_emails)

    attachment = MIMEBase('application', "octet-stream")
    try:
        with open(file_to_attach, "rb") as fh:
            data = fh.read()
        attachment.set_payload( data )
        encoders.encode_base64(attachment)
        attachment.add_header(*header)
        msg.attach(attachment)
    except IOError:
        msg = "Error opening attachment file %s" % file_to_attach
        print(msg)
        sys.exit(1)

    emails = to_emails + cc_emails

    server = smtplib.SMTP(host)
    server.sendmail(from_addr, emails, msg.as_string())
    server.quit()

if __name__ == "__main__":
    emails = ["mike@someAddress.org", "nedry@jp.net"]
    cc_emails = ["someone@gmail.com"]
    bcc_emails = ["anonymous@circe.org"]

    subject = "Test email with attachment from Python"
    body_text = "This email contains an attachment!"
    path = "/path/to/some/file"
    send_email_with_attachment(subject, body_text, emails, 
                               cc_emails, bcc_emails, path)

 

Here you have renamed your function and added a new argument, file_to_attach. You also need to add a header and create a MIMEMultipart object. The header could be created any time before you add the attachment.

You add elements to the MIMEMultipart object (msg) like you would keys to a dictionary. You’ll note that you have to use the email module’s formatdate method to insert the properly formatted date.

To add the body of the message, you need to create an instance of MIMEText. If you’re paying attention, you’ll see that you didn’t add the BCC information, but you could easily do so by following the conventions in the code above.

Next, you add the attachment. You wrap it in an exception handler and use the with statement to extract the file and place it in your MIMEBase object. Finally, you add it to the msg variable and you send it out. Notice that you have to convert the msg to a string in the sendmail() method.

Wrapping Up

Now you know how to send out emails with Python. For those of you that like mini-projects, you should go back and add additional error handling around the server.sendmail portion of the code in case something odd happens during the process.

One example would be a SMTPAuthenticationError or SMTPConnectError. You could also beef up the error handling during the attachment of the file to catch other errors. Finally, you may want to take those various lists of emails and create one normalized list that has removed duplicates. This is especially important if you are reading a list of email addresses from a file.

Also, note that your from address is fake. You can spoof emails using Python and other programming languages, but that is very bad etiquette and possibly illegal depending on where you live. You have been warned!

Use your knowledge wisely and enjoy Python for fun and profit!

Related Reading

Want to learn more Python basics? Then check out the following tutorials:

 

The post How to Send Emails with Python appeared first on Mouse Vs Python.

September 21, 2021 12:30 PM UTC


Python for Beginners

Find the mirror image of a binary tree

Unlike a Python dictionary, a list, or a set, elements of a binary tree are represented in a hierarchical manner. Having hierarchy in a binary tree allows one to find its mirror image as each element in a binary tree has a fixed position . In this article, we will study  the algorithm to find the mirror image of a binary tree. We will also implement the algorithm in Python and will execute it on an example binary tree.

What is the mirror image of a binary tree?

Mirror image of a binary tree is another binary tree which can be created by swapping left child and right child at each node of a tree. So, to find the mirror image of a binary tree, we just have to swap the left child and right child of each node in the binary tree. Let us try to find the mirror image of the following tree.

mirror image of a binary tree a binary tree

To find the mirror image of the above tree, we will start from the root and swap children of each node.

At root , we will swap the left and right children of the binary tree. In this way, 20,11, and 22 will come into the right subtree of the binary tree and 53,52, and 78 will come to the left of the binary tree as follows.

Then we will move to the next level and swap the children of 53. Here, 78 will become the left child of 53 while 52 will become the right child of 53. Similarly we will swap the left child and right child of 20. In this way, 22 will become the left child of 20 while 11 will become the right child of 20. The output binary tree after swapping nodes at this level will be as follows.

Now we will move to the next level. At this level, all the nodes are leaf nodes and have no children. Due to this there will be no swapping at this level and the above image is the final output.

Algorithm to find mirror image of a binary tree

As we have seen above, We can find the mirror image of a binary tree by swapping the left child and right child of each node. Let us try to formulate the algorithm in a systematic way. 

In the last example, At the second level, each node has only leaf nodes as their children. To find the mirror image at a node at this level, we have just swapped the left and right child at each node at this level. At the root node, we have swapped its both subtrees. As we are swapping each subtree (leaf nodes are also subtree), we can implement this algorithm using recursion.

The algorithm for finding a mirror image of a binary tree can be formulated as follows.

  1. Start from the root node.
  2. Recursively find the mirror image of the left subtree.
  3. Recursively find the mirror image of the right subtree.
  4. Swap the left and right subtree.

Implementation of Algorithm in Python

Now we will implement the algorithm to find the mirror image of a binary tree in Python.

from queue import Queue


class BinaryTreeNode:
    def __init__(self, data):
        self.data = data
        self.leftChild = None
        self.rightChild = None


def mirror(node):
    if node is None:
        return None
    mirror(node.leftChild)
    mirror(node.rightChild)
    temp = node.leftChild
    node.leftChild = node.rightChild
    node.rightChild = temp


def insert(root, newValue):
    # if binary search tree is empty, create a new node and declare it as root
    if root is None:
        root = BinaryTreeNode(newValue)
        return root
    # if newValue is less than value of data in root, add it to left subtree and proceed recursively
    if newValue < root.data:
        root.leftChild = insert(root.leftChild, newValue)
    else:
        # if newValue is greater than value of data in root, add it to right subtree and proceed recursively
        root.rightChild = insert(root.rightChild, newValue)
    return root


def inorder(root):
    if root:
        inorder(root.leftChild)
        print(root.data, end=" ")
        inorder(root.rightChild)


root = insert(None, 50)
insert(root, 20)
insert(root, 53)
insert(root, 11)
insert(root, 22)
insert(root, 52)
insert(root, 78)
print("Inorder Traversal of tree before mirroring:")
inorder(root)
mirror(root)
print("\nInorder Traversal of tree after mirroring:")
inorder(root)

Output:

Inorder Traversal of tree before mirroring:
11 20 22 50 52 53 78 
Inorder Traversal of tree after mirroring:
78 53 52 50 22 20 11 

Here, we have created a binary tree node. After that, we defined functions to insert elements to the binary tree. We have also used the in-order tree traversal algorithm to print the elements of the tree before and after finding the mirror image.

Conclusion

In this article, we have implemented an algorithm to find the mirror image of a binary tree. To learn more about other data structures, you can read this article on Linked List in Python. Stay tuned for more articles on implementation of different algorithms in Python.

The post Find the mirror image of a binary tree appeared first on PythonForBeginners.com.

September 21, 2021 12:15 PM UTC


Stack Abuse

Machine Learning: Overfitting Is Your Friend, Not Your Foe

Note: These are the musings of a man - flawed and prone to misjudgement. The point of writing this is to promote a discussion on the topic, not to be right or contrarian. If any glaring mistakes are present in the writing, please let me know.

Let me preface the potentially provocative title with:

It's true, nobody wants overfitting end models, just like nobody wants underfitting end models.

Overfit models perform great on training data, but can't generalize well to new instances. What you end up with is a model that's approaching a fully hard-coded model tailored to a specific dataset.

Underfit models can't generalize to new data, but they can't model the original training set either.

The right model is one that fits the data in such a way that it performs well predicting values in the training, validation and test set, as well as new instances.

Overfitting vs. Data Scientists

Battling overfitting is given a spotlight because it's more illusory, and more tempting for a rookie to create overfit models when they start with their Machine Learning journey. Throughout books, blog posts and courses, a common scenario is given:

"This model has a 100% accuracy rate! It's perfect! Or not. Actually, it just badly overfits the dataset, and when testing it on new instances, it performs with just X%, which is equal to random guessing."

After these sections, entire book and course chapters are dedicated to battling overfitting and how to avoid it. The word itself became stigmatized as a generally bad thing. And this is where the general conception arises:

"I must avoid overfitting at all costs."

It's given much more spotlight than underfitting, which is equally as "bad". It's worth noting that "bad" is an arbitrary term, and none of these conditions are inherently "good" or "bad". Some may claim that overfit models are technically more useful, because they at least perform well on some data while underfit models perform well on no data, but the illusion of success is a good candidate for outweighing this benefit.

For reference, let's consult Google Trends and the Google Ngram Viewer. Google Trends display trends of search data, while the Google Ngram Viewer counts number of occurences of n-grams (sequences of n items, such as words) in literature, parsing through a vast number of books through the ages:

overfitting vs underfitting search trends and ngram viewer

Everybody talks about overfitting and mostly in the context of avoiding it - which oftentimes leads people to a general notion that it's inherently a bad thing.

This is true, to a degree. Yes - you don't want the end model to overfit badly, otherwise, it's practically useless. But you don't arrive at the end model right away - you tweak it numerous times, with various hyperparameters. During this process is where you shouldn't mind seeing overfitting happening - it's a good sign, though, not a good result.

How Overfitting Isn’t as Bad as It’s Made Out to Be

A model and architecture that has the ability to overfit, is more likely to have the ability to generalize well to new instances, if you simplify it (and/or tweak the data).

If a model can overfit, it has enough entropic capacity to extract features (in a meaningful and non-meaningful way) from data. From there, it's either that the model has more than required entropic capacity (complexity/power) or that the data itself isn't enough (very common case).

The reverse statement can also be true, but more rarely. If a given model or architecture underfits, you can try tweaking the model to see if it picks up certain features, but the type of model might just be plain wrong for the task and you won't be able to fit the data with it no matter what you do. Some models just get stuck at some level of accuracy, as they simply can't extract enough features to distinguish between certain classes, or predict values.

In cooking - a reverse analogy can be created. It's better to undersalt the stew early on, as you can always add salt later to taste, but it's hard to take it away once already put in.

In Machine Learning - it's the opposite. It's better to have a model overfit, then simplify it, change hyperparameters, augment the data, etc. to make it generalize well, but it's harder (in practical settings) to do the opposite. Avoiding overfitting before it happens might very well keep you away from finding the right model and/or architecture for a longer period of time.

In practice, and in some of the most fascinating use cases of Machine Learning, and Deep Learning, you'll be working on datasets that you'll be having trouble overfitting. These will be datasets that you'll routinely be underfitting, without the ability of finding models and architectures that can generalize well and extract features.

It's also worth noting the difference between what I call true overfitting and partial overfitting. A model that overfits a dataset, and achieves 60% accuracy on the training set, with only 40% on the validation and test sets is overfitting a part of the data. However, it's not truly overfitting in the sense of eclipsing the entire dataset, and achieving a near 100% (false) accuracy rate, while its validation and test sets sit low at, say, ~40%.

A model that partially overfits isn't one that'll be able to generalize well with simplification, as it doesn't have enough entropic capacity to truly (over)fit. Once it does, my argument applies, though it doesn't guarantee success, as clarified in the proceeding sections.

Case Study - Friendly Overfitting Argument

The MNIST handwritten digits dataset, compiled by Yann LeCun is one of the classical benchmark datasets used for training classification models.

It's also the most overused dataset, potentially ever.

Nothing wrong with the dataset itself - it's actually pretty good, but finding example upon example on the same dataset is boring. At one point - we overfit ourselves looking at it. How much? Here's my attempt at listing the first ten MNIST digits from the top of my head:

5, 0, 4, 1, 9, 2, 2, 4, 3

How did I do?

from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

# Import and normalize the images, splitting out a validation set
(X_train_full, Y_train_full), (X_test, Y_test) = keras.datasets.mnist.load_data()

X_valid, X_train = X_train_full[:5000]/255.0, X_train_full[5000:]/255.0
Y_valid, Y_train = Y_train_full[:5000], Y_train_full[5000:]

X_test = X_test/255.0

# Print out the first ten digits
fig, ax = plt.subplots(1, 10, figsize=(10,2))
for i in range(10):
    ax[i].imshow(X_train_full[i])
    ax[i].axis('off')
    plt.subplots_adjust(wspace=1) 

plt.show()

Almost there.

I'll use this chance to make a public plea to all content creators to not overuse this dataset beyond the introductory parts, where the simplicity of the dataset can be used to lower the barrier to entry. Please.

Additionally, this dataset makes it hard to build a model that underfits. It's just too simple - and even a fairly small Multilayer Perceptron (MLP) classifier built with an intuitive number of layers and neurons per layer can easily reach upwards of 98% accuracy on the training, testing and validation set. Here's a Jupyter Notebook of a simple MPL achieving ~98% accuracy on both the training, validation and testing sets, which I spun up with sensible defaults.

I haven't even bothered to try tuning it to perform better than the initial setup.

The CIFAR10 and CIFAR100 Datasets

Let's use a dataset that's more complicated than MNIST handwritten digits, and which makes a simple MLP underfit but which is simple enough to let a decently-sized CNN to truly overfit on it. A good candidate is the CIFAR dataset.

There's a 10 classes of images in CIFAR10, and 100 in CIFAR100. Additionally, the CIFAR100 dataset has 20 families of similar classes, which means the network additionally has to learn the minute differences between similar, but different classes. These are known as "fine labels" (100) and "coarse labels" (20) and predicting these is equal to predicting the specific class, or just the family it belongs to.

For instance, here's a superclass (coarse label) and it's subclasses (fine labels):

Superclass Subclasses
food containers bottles, bowls, cans, cups, plates

A cup is a cylinder, similar to a soda can, and some bottles may be too. Since these low-level features are relatively similar, it's easy to chuck them all into the "food container" category, but higher-level abstraction is required to properly guess whether something is a "cup" or a "can".

What makes this job even harder is that CIFAR10 has 6000 images per class, while CIFAR100 has 600 images per class, giving the network less images to learn the ever so subtle differences from. Cups without handles exist, and cans without ridges do too. From a profile - it might not be too easy to tell them apart.

This is where, say, a Multilayer Perceptron simply doesn't have the abstraction power to learn, and it's doomed to fail, horribly underfitting. Convolutional Neural Networks are built based on the Neocognitron, which took hints from neuroscience and the hierarchical pattern recognition that the brain performs. These networks are able to extract features like this, and excel at the task. So much so that they oftentimes overfit badly and can't be used as is in the end - where we typically sacrifice some accuracy for the sake of generalization ability.

Let's train two different network architectures on the CIFAR10 and CIFAR100 dataset as an illustration of my point.

This is also where we'll be able to see how even when a network overfits, it's no guarantee that the network itself will definitely generalize well if simplified - it might not be able to generalize if simplified, though there is a tendency. The network might be right, but the data might not be enough.

In the case of CIFAR100 - just 500 images for training (and 100 for testing) per class is not enough for a simple CNN to really generalize well on the entire 100 classes, and we'll have to perform data augmentation to help it along. Even with data augmentation, we might not get a highly accurate network as there's just so much you can do to the data. If the same architecture performs well on CIFAR10, but not CIFAR100 - it means it simply can't distinguish from some of the more fine-grained details that make the difference between cylindrical objects that we call a "cup", "can" and "bottle", for instance.

The vast majority of advanced network architectures that achieve a high accuracy on the CIFAR100 dataset perform data augmentation or otherwise expand the training set.

Most of them have to, and that's not a sign of bad engineering. In fact - the fact that we can expand these datasets and help networks generalize better is a sign of engineering ingenuity.

Additionally, I'd invite any human to try and guess what these are, if they're convinced that image classification isn't too hard with images as small as 32x32:

Is Image 4 a few oranges? Ping pong balls? Egg yolks? Well, probably not egg yolks, but that requires prior knowledge on what "eggs" are and whether you're likely to find yolks sitting on the table, which a network won't have. Consider the amount of prior knowledge you may have regarding the world and how much it affects what you see.

Importing the Data

We'll be using Keras as the deep learning library of choice, but you can follow along with other libraries or even your custom models if you're up for it.

But first off, let's load it in, separate the data into a training, testing and validation set, normalizing the image values to 0..1:

from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

# Starting with CIFAR10
(X_train_full, Y_train_full), (X_test, Y_test) = keras.datasets.cifar10.load_data()

X_valid, X_train = X_train_full[:5000]/255.0, X_train_full[5000:]/255.0
Y_valid, Y_train = Y_train_full[:5000], Y_train_full[5000:]

X_test = X_test/255.0

Then, let's visualize some of the images in the dataset to get an idea of what we're up against:

fig, ax = plt.subplots(5, 5, figsize=(10, 10))
ax = ax.ravel()

# Labels come as numbers of [0..9], so here are the class names for humans
class_names = ['Airplane', 'Automobile', 'Bird', 'Cat', 'Deer', 'Dog', 'Frog', 'Horse', 'Ship', 'Truck']

for i in range(25):
    ax[i].imshow(X_train_full[i])
    ax[i].set_title(class_names[Y_train_full[i][0]])
    ax[i].axis('off')
    plt.subplots_adjust(wspace=1) 

plt.show()

Underfitting Multilayer Perceptron

Pretty much no matter what we do, the MLP won't perform that well. It'll definitely reach some level of accuracy based on the raw sequences of information coming in - but this number is capped and probably won't be too high.

The network will start overfitting at one point, learning the concrete sequences of data denoting images, but will still have low accuracy on the training set even when overfitting, which is the prime time to stop training it, since it simply can't fit the data well. Training networks has a carbon footprint, you know.

Let's add in an EarlyStopping callback to avoid running the network beyond the point of common sense, and set the epochs to a number beyond what we'll run it for (so EarlyStopping can kick in).

We'll use the Sequential API to add a couple of layers with BatchNormalization and a bit of Dropout. They help with generalization and we want to at least try to get this model to learn something.

The main hyperparameters we can tweak here are the number of layers, their sizes, activation functions, kernel initializers and dropout rates, and here's a "decently" performing setup:

checkpoint = keras.callbacks.ModelCheckpoint("simple_dense.h5", save_best_only=True)
early_stopping = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)

model = keras.Sequential([
  keras.layers.Flatten(input_shape=[32, 32, 3]),
  keras.layers.BatchNormalization(),
  keras.layers.Dense(75),
    
  keras.layers.Dense((50), activation='elu'),
  keras.layers.BatchNormalization(),
  keras.layers.Dropout(0.1),
    
  keras.layers.Dense((50), activation='elu'),
  keras.layers.BatchNormalization(),
  keras.layers.Dropout(0.1),
    
  keras.layers.Dense(10, activation='softmax')
])

model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.Nadam(learning_rate=1e-4),
              metrics=["accuracy"])

history = model.fit(X_train, 
                    Y_train, 
                    epochs=150, 
                    validation_data=(X_valid, Y_valid),
                    callbacks=[checkpoint, early_stopping])

Let's see if the starting hypothesis is true - it'll start out learning and generalizing to some extent but will end up having low accuracy on both the training set as well as the testing and validation set, resulting in an overall low accuracy.

For CIFAR10, the network performs "okay"-ish:

Epoch 1/150
1407/1407 [==============================] - 5s 3ms/step - loss: 1.9706 - accuracy: 0.3108 - val_loss: 1.6841 - val_accuracy: 0.4100
...
Epoch 50/150
1407/1407 [==============================] - 4s 3ms/step - loss: 1.2927 - accuracy: 0.5403 - val_loss: 1.3893 - val_accuracy: 0.5122

Let's take a look at the history of its learning:

pd.DataFrame(history.history).plot()
plt.show()

model.evaluate(X_test, Y_test)

313/313 [==============================] - 0s 926us/step - loss: 1.3836 - accuracy: 0.5058
[1.383605718612671, 0.5058000087738037]

The overall accuracy gets up to ~50% and the network gets here pretty quickly and starts plateauing. 5/10 images being correctly classified sounds like tossing a coin, but remember that there are 10 classes here, so if it were randomly guessing, it'd on average guess a single image out of ten. Let's switch to the CIFAR100 dataset, which also necessitates a network with at least a tiny bit more power, as there are less training instances per class, as well as a vastly higher number of classes:

checkpoint = keras.callbacks.ModelCheckpoint("bigger_dense.h5", save_best_only=True)
early_stopping = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)

# Changing the loaded data
(X_train_full, Y_train_full), (X_test, Y_test) = keras.datasets.cifar100.load_data()

# Modify the model
model1 = keras.Sequential([
  keras.layers.Flatten(input_shape=[32, 32, 3]),
  keras.layers.BatchNormalization(),
  keras.layers.Dense(256, activation='relu', kernel_initializer="he_normal"),
    
  keras.layers.Dense(128, activation='relu'),
  keras.layers.BatchNormalization(),
  keras.layers.Dropout(0.1),

  keras.layers.Dense(100, activation='softmax')
])


model1.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.Nadam(learning_rate=1e-4),
              metrics=["accuracy"])

history = model1.fit(X_train, 
                    Y_train, 
                    epochs=150, 
                    validation_data=(X_valid, Y_valid),
                    callbacks=[checkpoint, early_stopping])

The network performs fairly badly:

Epoch 1/150
1407/1407 [==============================] - 13s 9ms/step - loss: 4.2260 - accuracy: 0.0836 - val_loss: 3.8682 - val_accuracy: 0.1238
...
Epoch 24/150
1407/1407 [==============================] - 12s 8ms/step - loss: 2.3598 - accuracy: 0.4006 - val_loss: 3.3577 - val_accuracy: 0.2434

And let's plot the history of its progress, as well as evaluate it on the testing set (which will likely perform as well as the validation set):

pd.DataFrame(history.history).plot()
plt.show()

model.evaluate(X_test, Y_test)

313/313 [==============================] - 0s 2ms/step - loss: 3.2681 - accuracy: 0.2408
[3.2681326866149902, 0.24079999327659607]

As expected, the network wasn't able to grasp the data well. It ended up having an overfit accuracy of 40%, and an actual accuracy of ~24%.

The accuracy capped at 40% - it wasn't really able to overfit the dataset, even if it overfit some parts of it that it was able to discern given the limited architecture. This model doesn't have the necessary entropic capacity required for it to truly overfit for the sake of my argument.

This model and its architecture simply isn't well suited for this task - and while we could technically get it to (over)fit more, it'll still have issues on the long-run. For instance, let's turn it into a bigger network, which would theoretically let it recognize more complex patterns:

model2 = keras.Sequential([
  keras.layers.Flatten(input_shape=[32, 32, 3]),
  keras.layers.BatchNormalization(),
  keras.layers.Dense(512, activation='relu', kernel_initializer="he_normal"),
    
  keras.layers.Dense(256, activation='relu'),
  keras.layers.BatchNormalization(),
  keras.layers.Dropout(0.1),
    
  keras.layers.Dense(128, activation='relu'),
  keras.layers.BatchNormalization(),
  keras.layers.Dropout(0.1),

  keras.layers.Dense(100, activation='softmax')
])

Though, this doesn't do much better at all:

Epoch 24/150
1407/1407 [==============================] - 28s 20ms/step - loss: 2.1202 - accuracy: 0.4507 - val_loss: 3.2796 - val_accuracy: 0.2528

It's much more complex (density explodes), yet it simply cannot extract much more:

model1.summary()
model2.summary()
Model: "sequential_17"
...
Total params: 845,284
Trainable params: 838,884
Non-trainable params: 6,400
_________________________________________________________________
Model: "sequential_18"
...
Total params: 1,764,324
Trainable params: 1,757,412
Non-trainable params: 6,912

Overfitting Convolutional Neural Network on CIFAR10

Now, let's try doing something different. Switching to a CNN will significantly help with extracting features from the dataset, thereby allowing the model to truly overfit, reaching much higher (illusory) accuracy.

We'll kick out the EarlyStopping callback to let it do its thing. Additionally, we won't be using Dropout layers, and instead try to force the network to learn the features through more layers.

Note: Outside of the context of trying to prove the argument, this would be horrible advice. This is opposite of what you'd want to do by the end. Dropout helps networks generalize better, by forcing the non-dropped neurons to pick up the slack. Forcing the network to learn through more layers it more likely to lead to an overfit model.

The reason I'm purposefully doing this is to allow the network to horribly overfit as a sign of it's ability to actually discern features, before simplifying it and adding Dropout to really allow it to generalize. If it reaches high (illusory) accuracy, it can extract much more than the MLP model, which means we can start simplyfing it.

Let's once again use the Sequential API to build a CNN, firstly on the CIFAR10 dataset:

checkpoint = keras.callbacks.ModelCheckpoint("overcomplicated_cnn_cifar10.h5", save_best_only=True)

model = keras.models.Sequential([
    keras.layers.Conv2D(64, 3, activation='relu', 
                        kernel_initializer="he_normal", 
                        kernel_regularizer=keras.regularizers.l2(l=0.01), 
                        padding='same', 
                        input_shape=[32, 32, 3]),
    keras.layers.Conv2D(64, 3, activation='relu', padding='same'),
    keras.layers.MaxPooling2D(2),
    
    keras.layers.Conv2D(128, 2, activation='relu', padding='same'),
    keras.layers.Conv2D(128, 2, activation='relu', padding='same'),
    keras.layers.MaxPooling2D(2),
    
    keras.layers.Conv2D(256, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(256, 3, activation='relu', padding='same'),
    keras.layers.MaxPooling2D(2),
    
    keras.layers.Conv2D(128, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(128, 3, activation='relu', padding='same'),
    keras.layers.MaxPooling2D(2),
    
    keras.layers.Conv2D(64, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(64, 3, activation='relu', padding='same'),
    keras.layers.MaxPooling2D(2),
    
    keras.layers.Flatten(),    
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.Adam(learning_rate=1e-3),
              metrics=["accuracy"])

model.summary()

history = model.fit(X_train, 
                    Y_train, 
                    epochs=150,
                    batch_size=64,
                    validation_data=(X_valid, Y_valid),
                    callbacks=[checkpoint])

Awesome, it overfit pretty quickly! Within just a few epochs, it started overfitting the data, and by epoch 31, it got up to 98%, with a lower validation accuracy:

Epoch 1/150
704/704 [==============================] - 149s 210ms/step - loss: 1.9561 - accuracy: 0.4683 - val_loss: 2.5060 - val_accuracy: 0.3760
...
Epoch 31/150
704/704 [==============================] - 149s 211ms/step - loss: 0.0610 - accuracy: 0.9841 - val_loss: 1.0433 - val_accuracy: 0.6958

Since there are only 10 output classes, even though we tried overfitting it a lot by creating an unnecessarily big CNN, the validation accuracy is still fairly high.

Simplifying the Convolutional Neural Network on CIFAR10

Now, let's simplify it to see how it'll fare with a more reasonable architecture. We'll add in BatchNormalization and Dropout as both help with the generalization:

checkpoint = keras.callbacks.ModelCheckpoint("simplified_cnn_cifar10.h5", save_best_only=True)
early_stopping = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)

model = keras.models.Sequential([
    keras.layers.Conv2D(32, 3, activation='relu', kernel_initializer="he_normal", kernel_regularizer=keras.regularizers.l2(l=0.01), padding='same', input_shape=[32, 32, 3]),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(32, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),
    keras.layers.Dropout(0.4),
    
    keras.layers.Conv2D(64, 2, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(64, 2, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),
    keras.layers.Dropout(0.4),
    
    keras.layers.Conv2D(128, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(128, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),
    keras.layers.Dropout(0.5),
    
    keras.layers.Flatten(),    
    keras.layers.Dense(32, activation='relu'),
    keras.layers.BatchNormalization(),
    keras.layers.Dropout(0.3),
    keras.layers.Dense(10, activation='softmax')
])

model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.Adam(learning_rate=1e-3),
              metrics=["accuracy"])

model.summary()

history = model.fit(X_train, 
                    Y_train, 
                    epochs=150,
                    batch_size=64,
                    validation_data=(X_valid, Y_valid),
                    callbacks=[checkpoint, early_stopping])

This model has a (modest) count of 323,146 trainable parameters, compared to 1,579,178 from the previous CNN. How does it perform?

Epoch 1/150
704/704 [==============================] - 91s 127ms/step - loss: 2.1327 - accuracy: 0.3910 - val_loss: 1.5495 - val_accuracy: 0.5406
...
Epoch 52/150
704/704 [==============================] - 89s 127ms/step - loss: 0.4091 - accuracy: 0.8648 - val_loss: 0.4694 - val_accuracy: 0.8500

It actually achieves a pretty decent ~85% accuracy! Occam's Razor strikes again. Let's take a look at some of the results:

y_preds = model.predict(X_test)
print(y_preds[1])
print(np.argmax(y_preds[1]))

fig, ax = plt.subplots(6, 6, figsize=(10, 10))
ax = ax.ravel()

for i in range(0, 36):
    ax[i].imshow(X_test[i])
    ax[i].set_title("Actual: %s\nPred: %s" % (class_names[Y_test[i][0]], class_names[np.argmax(y_preds[i])]))
    ax[i].axis('off')
    plt.subplots_adjust(wspace=1)
    
plt.show()

The main misclassifications are two images in this small set - a dog was misclassified as a deer (respectable enough), but a closeup of an emu bird was classified as a cat (funny enough so we'll let it slide).

Overfitting Convolutional Neural Network on CIFAR100

What happens when we go for the CIFAR100 dataset?

checkpoint = keras.callbacks.ModelCheckpoint("overcomplicated_cnn_model_cifar100.h5", save_best_only=True)
early_stopping = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)

model = keras.models.Sequential([
    keras.layers.Conv2D(32, 3, activation='relu', kernel_initializer="he_normal", kernel_regularizer=keras.regularizers.l2(l=0.01), padding='same', input_shape=[32, 32, 3]),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(32, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),
    
    keras.layers.Conv2D(64, 2, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(64, 2, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),
    
    keras.layers.Conv2D(128, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(128, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),
    
    keras.layers.Conv2D(128, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(128, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),
    
    keras.layers.Conv2D(64, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(64, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),
    
    keras.layers.Flatten(),    
    keras.layers.Dense(256, activation='relu'),
    keras.layers.BatchNormalization(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.BatchNormalization(),
    
    keras.layers.Dense(100, activation='softmax')
])

model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.Adam(learning_rate=1e-3),
              metrics=["accuracy"])

model.summary()

history = model.fit(X_train, 
                    Y_train, 
                    epochs=150,
                    batch_size=64,
                    validation_data=(X_valid, Y_valid),
                    callbacks=[checkpoint])
Epoch 1/150
704/704 [==============================] - 97s 137ms/step - loss: 4.1752 - accuracy: 0.1336 - val_loss: 3.9696 - val_accuracy: 0.1392
...
Epoch 42/150
704/704 [==============================] - 95s 135ms/step - loss: 0.1543 - accuracy: 0.9572 - val_loss: 4.1394 - val_accuracy: 0.4458

Wonderful! ~96% accuracy on the training set! Don't mind the ~44% validation accuracy just yet. Let's simplify the model real quick to get it to generalize better.

Failure to Generalize After Simplification

And this is where it becomes clear that the ability to overfit doesn't guarantee that the model could generalize better when simplified. In the case of CIFAR100, there aren't many training instances per class, and this will likely prevent a simplified version of the previous model learn well. Let's try it out:

checkpoint = keras.callbacks.ModelCheckpoint("simplified_cnn_model_cifar100.h5", save_best_only=True)
early_stopping = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)

model = keras.models.Sequential([
    keras.layers.Conv2D(32, 3, activation='relu', kernel_initializer="he_normal", kernel_regularizer=keras.regularizers.l2(l=0.01), padding='same', input_shape=[32, 32, 3]),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(32, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),
    keras.layers.Dropout(0.4),
    
    keras.layers.Conv2D(64, 2, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(64, 2, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),
    keras.layers.Dropout(0.4),
    
    keras.layers.Conv2D(128, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(128, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),
    keras.layers.Dropout(0.5),
    
    keras.layers.Flatten(),    
    keras.layers.Dense(256, activation='relu'),
    keras.layers.BatchNormalization(),
    keras.layers.Dropout(0.3),
    keras.layers.Dense(100, activation='softmax')
])

model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.Adam(learning_rate=1e-3),
              metrics=["accuracy"])

history = model.fit(X_train, 
                    Y_train, 
                    epochs=150,
                    batch_size=64,
                    validation_data=(X_valid, Y_valid),
                    callbacks=[checkpoint, early_stopping])
Epoch 1/150
704/704 [==============================] - 96s 135ms/step - loss: 4.4432 - accuracy: 0.1112 - val_loss: 3.7893 - val_accuracy: 0.1702
...
Epoch 48/150
704/704 [==============================] - 92s 131ms/step - loss: 1.2550 - accuracy: 0.6370 - val_loss: 1.7147 - val_accuracy: 0.5466

It's plateauing and can't really get to generalize the data. In this case, it might not be the model's fault - maybe it's just right for the task, especially given the high accuracy on the CIFAR10 dataset, which has the same input shape and similar images in the dataset. It appears that the model can be reasonably accurate with the general shapes, but not the distinction between fine shapes.

The simpler model actually performs better than the more complicated one in terms of validation accuracy - so the more complex CNN doesn't get these fine details much better at all. Here, the problem most likely lies in the fact that there are only 500 training images per class, which really isn't enough. In the more complex network, this leads to overfitting, because there's not enough diversity - when simplified to avoid overfitting, this causes underfitting as again, there's no diversity.

This is why the vast majority of the papers linked before, and the vast majority of networks augment the data of the CIFAR100 dataset.

It's genuinely not a dataset for which it's easy to get high accuracy on, unlike the MNIST handwritten digits dataset, and a simple CNN like we're building probably won't cut it for high accuracy. Just remember the number of quite specific classes, how uninformative some of the images are, and just how much prior knowledge humans have to discern between these.

Let's do our best by augmenting a few images and artificially expanding the training data, to at least try to get a higher accuracy. Keep in mind that the CIFAR100 is, again, a genuinely difficult dataset to get high accuracy on with simple models. The state of the art models use different and novel techniques to shave off errors, and many of these models aren't even CNNs - they're Transformers.

If you'd like to take a look at the landscape of these models, PapersWithCode has done a beautiful compilation of papers, source code and results.

Data Augmentation with Keras' ImageDataGenerator Class

Will data augmentation help? Usually, it does, but with a serious lack of training data like we're facing, there's just so much you can do with random rotations, flipping, cropping, etc. If an architecture can't generalize well on a dataset, you'll likely boost it via data augmentation, but it probably won't be a whole lot.

That being said, let's use Keras' ImageDataGenerator class to try and generate some new training data with random changes, in hopes of improving the model's accuracy. If it does improve, it shouldn't be by a huge amount, and it'll likely get back to partially overfitting the dataset without an ability to either generalize well or fully overfit the data.

Given the constant random variations in the data, the model is less likely to overfit on the same number of epochs, as the variations make it keep adjusting to "new" data. Let's run it for, say, 300 epochs, which is significantly more than the rest of the networks we've trained. This is possible without major overfitting, again, due to the random modifications made to the images while they're flowing in:

checkpoint = keras.callbacks.ModelCheckpoint("augmented_cnn.h5", save_best_only=True)

model = keras.models.Sequential([
    keras.layers.Conv2D(64, 3, activation='relu', kernel_initializer="he_normal", kernel_regularizer=keras.regularizers.l2(l=0.01), padding='same', input_shape=[32, 32, 3]),
    keras.layers.Conv2D(64, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),
    keras.layers.Dropout(0.4),
    
    keras.layers.Conv2D(128, 2, activation='relu', padding='same'),
    keras.layers.Conv2D(128, 2, activation='relu', padding='same'),
    keras.layers.Conv2D(128, 2, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),
    keras.layers.Dropout(0.4),
    
    keras.layers.Conv2D(256, 3, activation='relu', padding='same'),
    keras.layers.Conv2D(256, 3, activation='relu', padding='same'),
    keras.layers.Conv2D(256, 3, activation='relu', padding='same'),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),
    keras.layers.Dropout(0.4),
    
    keras.layers.Flatten(),    
    keras.layers.Dense(512, activation='relu'),
    keras.layers.BatchNormalization(),
    keras.layers.Dropout(0.3),
    keras.layers.Dense(100, activation='softmax')
])

    
train_datagen = ImageDataGenerator(rotation_range=30,
        height_shift_range=0.2,
        width_shift_range=0.2,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        vertical_flip=True,
        fill_mode='nearest')

valid_datagen = ImageDataGenerator()

train_datagen.fit(X_train)
valid_datagen.fit(X_valid)

train_generator = train_datagen.flow(X_train, Y_train, batch_size=128)
valid_generator = valid_datagen.flow(X_valid, Y_valid, batch_size=128)

model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.Adam(learning_rate=1e-3, decay=1e-6),
              metrics=["accuracy"])

history = model.fit(train_generator, 
                    epochs=300,
                    batch_size=128,
                    steps_per_epoch=len(X_train)//128,
                    validation_data=valid_generator,
                    callbacks=[checkpoint])
Epoch 1/300
351/351 [==============================] - 16s 44ms/step - loss: 5.3788 - accuracy: 0.0487 - val_loss: 5.3474 - val_accuracy: 0.0440
...
Epoch 300/300
351/351 [==============================] - 15s 43ms/step - loss: 1.0571 - accuracy: 0.6895 - val_loss: 2.0005 - val_accuracy: 0.5532

The model is performing with ~55% on the validation set, and is still overfitting the data partially. The val_loss has stopped going down, and is quite rocky, even with a higher batch_size.

This network simply can't learn and fit the data with high accuracy, even though it variations on it do have the entropic capacity to overfit the data.

Conclusion?

Overfitting isn't inherently a bad thing - it's just a thing. No, you don't want overfit end-models, but it shouldn't be treated as the plague and can even be a good sign that a model could perform better given more data and a simplification step. This isn't guaranteed, by any means, and the CIFAR100 dataset has been used as an example of a dataset that's not easy to generalize well to.

The point of this rambling is, again, not to be contrarian - but to incite discussion on the topic, which doesn't appear to be taking much place.

Who am I to make this claim?

Just someone who sits home, practicing the craft, with a deep fascination towards tomorrow.

Do I have the ability to be wrong?

Very much so.

How should you take this piece?

Take it as you may - think for yourself if it makes sense or not. If you don't think I'm out of my place for noting this, let me know. If you think I'm wrong on this - by all means, please feel let me know and don't mince your words. :)

September 21, 2021 10:30 AM UTC


Wing Tips

Debug Python Code Run by Docker Compose with Wing Pro

This Wing Tip describes how to configure Docker Compose so that Python code running on selected container services can be debugged with Wing Pro. This makes it easy to develop and debug containerized applications written in Python.

Getting Started

Before you can work with Docker Compose you will need to download and install it and then create a working test cluster. See Install Docker Compose for details.

You should also install Wing Pro if you don't already have it.

Configuration

To configure Wing to use New Project in the Project menu. After selecting your source directory, choose Use Existing Python on the second dialog page, and then select Cluster:

/images/blog/docker-compose/python-executable.png

Next create a new cluster configuration by pressing the New button. This displays the cluster configuration dialog. You will need to enter an identifier to use within Wing and point it at the docker-compose.yml file for the cluster. You will also need to select the main service to use as the default place to run your Python Shell and unit test processes:

/images/blog/docker-compose/cluster-config.png

Once you have created your cluster configuration, submit the New Project dialog to complete your project setup.

Working with the Cluster

You can now control your cluster from Wing's Containers tool, found in the Tools menu. This tool lists the services in your cluster and their status. You can right-click on items here or use the Options menu to build, start, debug, and stop your cluster:

/images/blog/docker-compose/containers-tool.png

Debug processes, unit tests, Wing's integrated Python Shell and OS Commands can all be run in context of the cluster, or optionally instead within isolated containers that match the cluster configuration but run without launching the whole cluster.

See Using Wing Pro with Docker Compose for more information working with clusters in Wing Pro.



That's it for now! We'll be back soon with more Wing Tips for Wing Python IDE.

As always, please don't hesitate to email support@wingware.com if you run into problems, have any questions, or have topic suggestions for future Wing Tips!

September 21, 2021 01:00 AM UTC


Brett Cannon

(Not) unravelling generator expressions

In this post on Python&aposs syntactic sugar, I want to try to tackle generator expressions. If you look at the language definition for generator expressions you will see that it says, "[a] generator expression yields a new generator object" for what is specified (which is essentially a compact for loop with an expression for the body). So what does that look like if you take away the Python "magic" and unravel it down to its core Python semantics?

The bytecode

Let&aposs take the following example:

def spam():
    return (b for b in a)
Example generator expression

The bytecode for this is:

  1           0 LOAD_CONST               1 (<code object <genexpr> at 0x10076b500, file "<stdin>", line 1>)
              2 LOAD_CONST               2 (&aposspam.<locals>.<genexpr>&apos)
              4 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (a)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <genexpr> at 0x10076b500, file "<stdin>", line 1>:
  1           0 LOAD_FAST                0 (.0)
        >>    2 FOR_ITER                10 (to 14)
              4 STORE_FAST               1 (b)
              6 LOAD_FAST                1 (b)
              8 YIELD_VALUE
             10 POP_TOP
             12 JUMP_ABSOLUTE            2
        >>   14 LOAD_CONST               0 (None)
             16 RETURN_VALUE
Bytecode for the example generator expression

You may notice a couple of things that are interesting about this:

  1. The generator expression is very much just a for loop in a generator.
  2. The generator expression is stored as a constant in the function.
  3. a gets explicitly passed into the generator expression.

The semantics

The explicit passing of a is the surprising bit in how generator expressions work, but it actually makes sense when you read the explanation as to why this occurs:

... the iterable expression in the leftmost for clause is immediately evaluated, so that an error produced by it will be emitted at the point where the generator expression is defined, rather than at the point where the first value is retrieved. Subsequent for clauses and any filter condition in the leftmost for clause cannot be evaluated in the enclosing scope as they may depend on the values obtained from the leftmost iterable.

So by passing a in, the code for a is evaluated at the time of creation of the generator expression, not at the time of execution. That way if there&aposs an error with that part of the code the traceback will help you find where it was defined and not simply to where the generator expression happened to be run. Since subsequent for loops in the generator expression may rely on the loop variant in the first clause, you can&apost eagerly evaluate any other parts of the expression.

The unravelling

There&aposs a couple of details that required  to make unravelling a generator expression successful, so I&aposm going to build up a running example to cover all the various cases.

With only one for loop

Let&aposs start with (c for b in a) where c can be some expression. To unravel this we need to make a generator which takes in a as an argument to guarantee it is eagerly evaluated where the generator expression is defined.

def _gen_exp(_leftmost_iterable):
    for b in _leftmost_iterable:
        yield c
        
_gen_exp(a)
Unravelling (c for b in a)

We end up with a generator function which takes a single argument for the leftmost iterable. Let&aposs see what this looks like in some code that would use the generator expression:

def spam(a, b):
    func(arg=(str(b) for b in a))
Example of using a generator expression

This would then unravel to:

def spam(a, b):
    def _gen_exp(_leftmost_iterable):
        for b in _leftmost_iterable:
            yield str(b)
    
    func(arg=_gen_exp(a))
Unravelling the generator expression usage example

With multiple for loops

Now let&aposs toss in another for loop: (e for b in a for d in c). This unravels to:

def _gen_expr(_leftmost_iterable):
    for b in _leftmost_iterable:
        d in c:
            yield e
(e for b in a for d in c) unravelled

Since only the leftmost iterable is evaluated eagerly we can rely on the scoping rules for closures to get all of the other variables from the call site implicitly (this is where Python&aposs simple namespace system comes in handy).

Putting this into an example like:

def spam():
    x = range(2)
    y = range(3)
    return ((a, b) for a in x for b in y)
Example using multiple for loops in a generator expression

lead to an unravelling of:

def spam():
    x = range(2)
    y = range(3)
    
    def _gen_exp(_leftmost_iterable):
        for a in _leftmost_iterable:
            for b in y:
                yield (a, b)
                
    return _gen_exp(x)
Unravelling of a generator expression with multiple for loops

The generator expression needs x passed in because it&aposs the leftmost iterable, but everything else is captured by the closure.

Assignment expressions

Let&aposs make life complicated and throw in an assignment expression:

def spam():
    list(b := a for a in range(2))
    return b
Example of a generator expression with an assignment expression

The bytecode for this becomes:

  2           0 LOAD_GLOBAL              0 (list)
              2 LOAD_CLOSURE             0 (b)
              4 BUILD_TUPLE              1
              6 LOAD_CONST               1 (<code object <genexpr> at 0x1008393a0, file "<stdin>", line 2>)
              8 LOAD_CONST               2 (&aposspam.<locals>.<genexpr>&apos)
             10 MAKE_FUNCTION            8 (closure)
             12 LOAD_GLOBAL              1 (range)
             14 LOAD_CONST               3 (2)
             16 CALL_FUNCTION            1
             18 GET_ITER
             20 CALL_FUNCTION            1
             22 CALL_FUNCTION            1
             24 POP_TOP

  3          26 LOAD_DEREF               0 (b)
             28 RETURN_VALUE

Disassembly of <code object <genexpr> at 0x1008393a0, file "<stdin>", line 2>:
  2           0 LOAD_FAST                0 (.0)
        >>    2 FOR_ITER                14 (to 18)
              4 STORE_FAST               1 (a)
              6 LOAD_FAST                1 (a)
              8 DUP_TOP
             10 STORE_DEREF              0 (b)
             12 YIELD_VALUE
             14 POP_TOP
             16 JUMP_ABSOLUTE            2
        >>   18 LOAD_CONST               0 (None)
             20 RETURN_VALUE
Bytecode for example of generator expression with an assignment expression

The key thing to notice is the various *_DEREF opcodes which are what CPython uses to load/store nonlocal variables.

Now we could just add a nonlocal statement to our unravelled generator expression and assume we are done, but there is one issue to watch out for: has the variable previously been defined in the enclosing scope? If the variable doesn&apost exist when the scope with the nonlocal is defined (technically the compiler walking the AST has not seen the variable yet), Python will raise an exception: SyntaxError: no binding for nonlocal &aposb&apos found.

Python gets to take a shortcut when it comes to a generator expression with an assignment expression and simply consider the nonlocal as implicit without regards as to whether the variable was previously defined. But we don&apost get to cheat, and that means we may have to define the variable with a dummy value to make the CPython compiler happy.

But we also have to deal with whether the generator expression is ever run or runs but never sets b (i.e. the iterable has a length of 0). In the example that would raise UnboundLocalError: local variable &aposb&apos referenced before assignment. To replicate that we need to delete b if it never gets set appropriately.

What all of this means is our example unravels to:

def spam():
    b = _PLACEHOLDER
    
    def _gen_exp(_leftmost_iterable):
        nonlocal b
        for a in _leftmost_iterable:
            yield (b := a)
            
    list(_gen_expr(range(2)))
    if b is _PLACEHOLDER:
        del b
    return b
Unravelling of generator expression example with an assignment expression

But remember, we only want to do any of this nonlocal work if there are assignment expressions to worry about.

The best laid plans ...

I actually wrote this entire post thinking I had solved the unravelling of generator expressions, and then I realized assignment expressions thwarted me in the end. Consider the following example:

def spam():
    return ((b := x for x in range(5)), b)
Example where the result of an assignment expression is relied upon in the same statement

If you run that example you end up with UnboundLocalError: local variable &aposb&apos referenced before assignment. Now let&aposs unravel this:

def spam():
    b = _PLACEHOLDER
    def _gen_expr(_leftmost_iterable):
        nonlocal b
        for x in _leftmost_iterable:
            yield (b := x)
            
    return _gen_expr(range(5)), b
Unravelling of the assignment expression reliance example

Unfortunately calling this function succeeds. And since del is a statement there&aposs no way to insert ourselves into that expression to prevent b from being resolved. This means we cannot unravel assignment expressions. 🙁

So while I thought I had unravelled assignment expressions, it seems that in the end I was unsuccessful. But I have decided to publish this anyway to show how I typically approach unravelling a bit of syntax and sometimes are unable to do it.

Aside: what came first, the expression or the comprehension?

If you have not been programming in Python for more than 15 years you may think generator expressions came first, then list comprehensions. But actually it&aposs the other way around: list comprehensions were introduced in Python 2.0 and generator expressions came in Python 2.4. This is because generators were introduced in Python 2.2 (thanks to the inspiration from Icon), and so the possibility of even having generator expressions didn&apost exist when list comprehensions came into existence (thanks to the inspiration from Haskell).

September 21, 2021 12:45 AM UTC

September 20, 2021


Real Python

Using the "and" Boolean Operator in Python

Python has three Boolean operators, or logical operators: and, or, and not. You can use them to check if certain conditions are met before deciding the execution path your programs will follow. In this tutorial, you’ll learn about the and operator and how to use it in your code.

In this tutorial, you’ll learn how to:

  • Understand the logic behind Python’s and operator
  • Build and understand Boolean and non-Boolean expressions that use the and operator
  • Use the and operator in Boolean contexts to decide the course of action of your programs
  • Use the and operator in non-Boolean contexts to make your code more concise

You’ll also code a few practical examples that will help you understand how to use the and operator to approach different problems in a Pythonic way. Even if you don’t use all the features of and, learning about them will allow you to write better and more accurate code.

Free Download: Get a sample chapter from Python Tricks: The Book that shows you Python’s best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.

Working With Boolean Logic in Python

Back in 1854, George Boole authored The Laws of Thought, which contains what’s known as Boolean algebra. This algebra relies on two values: true and false. It also defines a set of Boolean operations, also known as logical operations, denoted by the generic operators AND, OR, and NOT.

These Boolean values and operators are pretty helpful in programming. For example, you can construct arbitrarily complex Boolean expressions with the operators and determine their resulting truth value as true or false. You can use the truth value of Boolean expressions to decide the course of action of your programs.

In Python, the Boolean type bool is a subclass of int and can take the values True or False:

>>>
>>> issubclass(bool, int)
True
>>> help(bool)
Help on class bool in module builtins:

class bool(int)
    ...

>>> type(True)
<class 'bool'>
>>> type(False)
<class 'bool'>

>>> isinstance(True, int)
True
>>> isinstance(False, int)
True

>>> int(True)
1
>>> int(False)
0

As you can see in this code, Python implements bool as a subclass of int with two possible values, True and False. These values are built-in constants in Python. They’re internally implemented as integer numbers with the value 1 for True and 0 for False. Note that both True and False must be capitalized.

Along with the bool type, Python provides three Boolean operators, or logical operators, that allow you to combine Boolean expressions and objects into more elaborate expressions. Those operators are the following:

Operator Logical Operation
and Conjunction
or Disjunction
not Negation

With these operators, you can connect several Boolean expressions and objects to build your own expressions. Unlike other languages, Python uses English words to denote Boolean operators. These words are keywords of the language, so you can’t use them as identifiers.

In this tutorial, you’ll learn about Python’s and operator. This operator implements the logical AND operation. You’ll learn how it works and how to use it either in a Boolean or non-Boolean context.

Getting Started With Python’s and Operator

Python’s and operator takes two operands, which can be Boolean expressions, objects, or a combination. With those operands, the and operator builds more elaborate expressions. The operands in an and expression are commonly known as conditions. If both conditions are true, then the and expression returns a true result. Otherwise, it returns a false result:

>>>
>>> True and True
True

>>> False and False
False

>>> True and False
False

>>> False and True
False

These examples show that an and expression only returns True when both operands in the expressions are true. Since the and operator takes two operands to build an expression, it’s a binary operator.

The quick examples above show what’s known as the and operator’s truth table:

operand1 operand2 operand1 and operand2
True True True
True False False
False False False
False True False

This table summarizes the resulting truth value of a Boolean expression like operand1 and operand2. The result of the expression depends on the truth values of its operands. It’ll be true if both are true. Otherwise, it’ll be false. This is the general logic behind the and operator. However, this operator can do more than that in Python.

In the following sections, you’ll learn how to use and for building your own expressions with different types of operands.

Using Python’s and Operator With Boolean Expressions

You’ll typically use logical operators to build compound Boolean expressions, which are combinations of variables and values that produce a Boolean value as a result. In other words, Boolean expressions return True or False.

Read the full article at https://realpython.com/python-and-operator/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

September 20, 2021 02:00 PM UTC


Stack Abuse

Guide to Python's keyboard Module

Introduction

Python is one of the most suitable languages for automating tasks. Whether it's repeatable (ethical) web scraping after some time period, starting some programs on a computer start up, or automating sending mundane e-mails, Python has a lot of modules that make your life easier.

One of these is a module called keyboard, and it takes full control of your keyboard. With this module, you can type out anything, create hot-keys, create abbreviations, block the keyboard, wait for input, etc.

In this guide, we'll take a look at how to set up and use the keyboard module in Python.

Note: Applications working with automating human-like processes should be developed ethically and responsibly. The keyboard module is made to be very observable, and thus makes it both discouraged and transparent if anyone's using it to create keyloggers or malicious bots.

Installing the keyboard Module

Note: The version of Python used in this guide is 3.8. However, the keyboard module can work with both Python 2.x and Python 3.x.

If you're using Linnux, in order to use this library, you must install it as root. If you don't, you'll get an:

ImportError: You must be root to use this library on linux.

Also, when running your script, you should run it with root privileges:

$ sudo pip3 install keyboard
$ sudo python3 my_script.py

On Windows and MacOS, as the privileges work much differently - you can install it simply via pip and run the scripts:

$ pip install keyboard
$ python my_script.py

Note: For MacOS, you might have to allow the Terminal or other apps to change the state of your machine, such as by typing. Also keep in mind that as of September 2021, the library is still experimental on MacOS.

The keyboard Module's Functions

There are a lot of functions in this module that can be used to simulate keyboard actions.

We'll go through all of these, though, here's a quick example:

>>> import keyboard
>>> keyboard.write("Hello")
>>> Hello

The Hello message appears on the screen, in the terminal, as if you've written it. You can automate a command very easily, and create a hotkey alias for it. Here's a (crude) example of exiting the Python REPL, writing a curl commmand and executing it:

>>> import keyboard
>>> keyboard.write("exit()"); keyboard.send("enter"); keyboard.write("curl https://www.google.com"); keyboard.send("enter");
>>> exit()
curl https://www.google.com
$ curl https://www.google.com
<!doctype html><html itemscope=""...

keyboard's write() and wait() Functions

The write() command writes a message, as we've seen before, with an optional delay in the start. If no delay is set, writing is instant. It's very nicely combined with the wait() function, which awaits a certain key to be pressed.

For instance, we can create a make-shift macro, tied to, say 1, which responds to that input with a new message. Note that there's an actual way to create hotkeys instead of this, which we'll cover later.

We'll create an infinite True loop to check for the key being pressed, and you can run the script in the background:

import keyboard

while True:
    keyboard.wait("1")
    keyboard.write("\n The key '1' was pressed!")

Note: Special characters are not supported by this function, so if you add, say, ! - you'll get hit with a StopIteration exception.

keyboard's press(), release() Functions

Since it's hard to simulate press() and release() so that the actions are visible, we'll also see record() and play() in action.

The press() function presses a key and releases it when you call release() on the same key. Note that you can't sleep() for some time to simulate holding down a key:

>>> import keyboard
>>> from time import sleep
>>> keyboard.press("a")
>>> sleep(1)
>>> keyboard.release("a")
>>> a

However, you can hold down some special keys, such as [SHIFT] or [CTRL] this way:

>>> keyboard.press("shift")
>>> keyboard.write("lowercase")
>>> keyboard.release("shift")

>>> LOWERCASE

keyboard's record() and play() Functions

It's not always about inputting new keys in - sometimes, you'd like to record what's going on and play it back. Keep in mind that you'll need administrator privileges to record any input like this, as the technology can easily be used to create key loggers.

The record() function accepts a trigger key, until which it records, and returns a sequence of events of type KeyboardEvent. You can then chuck this sequence of events into the play() function, which faithfully replays them, with an optional speed_factor argument that acts as a multiplier for the speed of the original events:

import keyboard
recorded_events = keyboard.record("esc")
keyboard.play(recorded_events)

If we're to print the recorded_events, they'd look something like:

KeyboardEvent(w up), KeyboardEvent(o down), ...]

The effects of these methods are best seen as a gif or recreated on your machine. For instance, a sequence of writing a message, deleting it and writing a different one instead:

keyboard's send() function

The send() function encompasses press() and release() together, and is used for single keys, unlike write() which is used for entire sentences:

import keyboard

recorded_events = keyboard.record("s")

keyboard.send("w")
keyboard.send("a")

keyboard.play(recorded_events)

Once s is pressed, the w and a keys are replayed.

The press() function can also accept combinations of pressed keys. You can send a combination of "ctrl+shift+s" for instance and the dialogue for saving a file should pop up, if you're in an application that supports that operation:

import keyboard

while True:
    keyboard.wait("s")
	keyboard.press("ctrl+shift+s")
	# Or for MacOS
	keyboard.press("command+shift+s)

Though, this isn't the right way to add hotkeys. Rather, you can use the add_hotkey() function.

keyboard's add_abbreviation() Function

The add_abbreviation() function is a pretty nifty one, as it allows you to define abbreviations for long inputs, and replaces the abbreviated versions with the saved full versions.

For instance, similar to how services like Google save your email for most input forms, you can create your own abbreviation and trigger it via [SPACE]:

>>> import keyboard
>>> keyboard.add_abbreviation("@", "john@stackabuse.com")

While running, if you type @ followed by a [SPACE] - the long-form input will replace the typed @.

keyboard's add_hotkey() Function

The add_hotkey() function accepts a hotkey you'd like to save, or a combination of keys, and a function. It's easy to pass in anonymous lambda functions here, though you can also add named functions.

For instance, let's add a hotkey for CTRL+j, which triggers a lambda function that logs this:

import keyboard

keyboard.add_hotkey("ctrl+alt+j", lambda: print("ctrl+alt+j was pressed"))

The hotkey, ctrl+alt+p, is saved and when you press this combination, you should see the output of the lambda.

Conclusion

The keyboard module is a lightweight and simple library used for simulating keystrokes and simple automation in Python. It's not very feature-rich, but can be used to automate some of the tasks you might be performing in your day-to-day work, or simply for a bit of fun.

A more mature, powerful module that can be used as an alternative is pynput.

September 20, 2021 10:30 AM UTC


Mike Driscoll

PyDev of the Week: Tonya Sims

This week we welcome Tonya Sims (@TonyaSims) as our PyDev of the Week! Tonya is a Python Developer Advocate for Vonage and is an active member of the Real Python community. Tonya recently gave a talk called “Faceoff Fun with Python Frameworks: FastAPI vs Flask” at EuroPython 2021.

Tonya Sims

Let’s spend some time getting to know Tonya better!

Can you tell us a little about yourself (hobbies, education, etc):

Sure! I’m a former pro athlete turned Pythonista. I played professional women’s basketball in Europe and in the WNBA.

As a kid growing up my parents bought a computer, it was a Texas Instrument and the first time I ever coded was on that machine doing Basic. Dang! I totally wish I would have stuck with it but school and sports dominated my life at that time.

In college at the University of Wisconsin, I majored in Business while playing for the Women’s Basketball Team. After playing professionally, I started working as a pharmaceutical sales rep. While I definitely respected the profession I felt it wasn’t a good fit for my creative and analytical strengths, so I transitioned out and moved to Chicago. This is where I picked up coding again.

My first tech job was working for a financial services company but not as a technologist of any sort. I was an executive assistant temp, assisting the vice president of our division within their tech department. Basically, my role was to answer phone calls, take notes and order pizzas for the office (LOL).

This couldn’t be my life! So I started learning basic UNIX commands so I could make a move. I was eventually able to move into a computer operator role at that company, swapping out tapes in a ginormous data center. It was a good start for my tech career but I didn’t want to end up getting pigeonholed as a tape swapper.

This is when I taught myself how to code.

I started with Visual Basic.NET (NOPE!), moved to C# (NOPE!), learned Java in graduate school (NOPE!), transitioned to Ruby (NOPE!). Then something magical happened.

I discovered Python …and the rest is history.

Over the years I’ve worked in different roles in tech from a software engineer to software engineer in test to build engineer.

Currently, I’m in my dream role at Vonage as a Python Developer Advocate.

Why did you start using Python?

I started using Python for many different reasons.

It just clicked for me. I’m not sure what it was about the other languages I’ve dabbled in but when I started learning Python the coding concepts started making more sense. Not to say I magically learned Python in 24 hours (isn’t there a book that promises that?). My Python journey definitely took some time and I’m still learning new things every single day!

The Python community is hands down the best. I felt so welcomed and supported. People were cheering for me and I for them. Getting help and asking questions is also extremely encouraged. The open-source aspect of Python also appealed to me and that’s one of the reasons I believe the community rocks!

What other programming languages do you know and which is your favorite?

I know some JavaScript and in an online bootcamp on Udemy for learning it. It seems there’s always some competition on Twitter about Python and JavaScript but I feel these two really work well together.

In my role at Vonage I maintain the Python SDKs and APIs and oftentimes have to build frontends to test my changes, which is normally done in Javascript.

I definitely encourage people to learn multiple languages, especially Python and JavaScript.

What projects are you working on now?

Currently, I’m working on building a sports index for the NBA,WNBA, NFL, MLB and college football and basketball. The idea is it’d crawl the web and rank players from best to worst based on stats and other factors in real time. I’m also building into it sentiment analysis in order to detect positive or negative news. All of this is important for fantasy sports competitors in making their player selections.

Which Python libraries are your favorite (core or 3rd party)?

I work a lot with the Requests library, and have gotten to know it even better over the last several months since developing and calling APIs. It’s not the most glamorous library but it definitely gets the job done when it comes to HTTP request calles.

I’m also dabbling in PyGame. I’ve always wanted to make games especially with the increase of gamification used in applications nowadays. Excited to see where this journey takes me!

I see you are a Python Developer Advocate. Can you describe what that is and what you do?

This is my first role as a Python Developer Advocate and it’s truly my dream job! In my role I have a 70/30% split, so most of my time is spent coding in Python. For example, maintaining and building out new features for our APIs.

The other 30% of my days are submitting my Call For Papers(CFP) to conferences and doing tech talks around the globe.

As a Python Developer Advocate it is my job to nurture the Python community. To make sure they have support for our products and services as a new or existing user.

What made you decide to become a speaker at Python Conferences?

It’s something I’ve always wanted to do and I LOVE teaching.

I’ve taught Python in the past and I absolutely enjoy sharing knowledge about something I’ve learned.

Thanks for doing the interview, Tonya!

The post PyDev of the Week: Tonya Sims appeared first on Mouse Vs Python.

September 20, 2021 05:05 AM UTC

September 19, 2021


Podcast.__init__

Experimenting With Reinforcement Learning Using MushroomRL

Reinforcement learning is a branch of machine learning and AI that has a lot of promise for applications that need to evolve with changes to their inputs. To support the research happening in the field, including applications for robotics, Carlo D'Eramo and Davide Tateo created MushroomRL. In this episode they share how they have designed the project to be easy to work with, so that students can use it in their study, as well as extensible so that it can be used by businesses and industry professionals. They also discuss the strengths of reinforcement learning, how to design problems that can leverage its capabilities, and how to get started with MushroomRL for your own work.

Summary

Reinforcement learning is a branch of machine learning and AI that has a lot of promise for applications that need to evolve with changes to their inputs. To support the research happening in the field, including applications for robotics, Carlo D’Eramo and Davide Tateo created MushroomRL. In this episode they share how they have designed the project to be easy to work with, so that students can use it in their study, as well as extensible so that it can be used by businesses and industry professionals. They also discuss the strengths of reinforcement learning, how to design problems that can leverage its capabilities, and how to get started with MushroomRL for your own work.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Your host as usual is Tobias Macey and today I’m interviewing Davide Tateo and Carlo D’Eramo about MushroomRL, a library for building reinforcement learning experiments

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what reinforcement learning is and how it differs from other approaches for machine learning?
  • What are some example use cases where reinforcement learning might be necessary?
  • Can you describe what MushroomRL is and the story behind it?
    • Who are the target users of the project?
    • What are its main goals?
  • What are your suggestions to other developers for implementing a succesful library?
  • What are some of the core concepts that researchers and/or engineers need to understand to be able to effectively use reinforcement learning techniques?
  • Can you describe how MushroomRL is architected?
    • How have the goals and design of the project changed or evolved since you began working on it?
  • What is the workflow for building and executing an experiment with MushroomRL?
    • How do you track the states and outcomes of experiments?
  • What are some of the considerations involved in designing an environment and reward functions for an agent to interact with?
  • What are some of the open questions that are being explored in reinforcement learning?
  • How are you using MushroomRL in your own research?
  • What are the most interesting, innovative, or unexpected ways that you have seen MushroomRL used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on MushroomRL?
  • When is MushroomRL the wrong choice?
  • What do you have planned for the future of MushroomRL?
  • How can the open-source community contribute to MushroomRL?
  • What kind of support you are willing to provide to users?

Keep In Touch

Picks

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

September 19, 2021 08:50 PM UTC


Łukasz Langa

Weekly Report 2021, September 13 - 19

This week in numbers: closed 8 issues, authored 1 PR, closed 49 PRs, and reviewed 6. No highlights this time since I badly hoped to be able to squeeze in some work on Saturday but that turned out not to be possible (it’s birthday season in my family).

September 19, 2021 08:20 PM UTC


Mike Driscoll

Python 101 – An Intro to Jupyter Notebook

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain code, equations, visualizations, and formatted text. By default, Jupyter Notebook runs Python out of the box. Additionally, Jupyter Notebook supports many other programming languages via extensions. You can use the Jupyter Notebook for data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more!

In this chapter, you will learn about the following:

This chapter is not meant to be a comprehensive tutorial on the Jupyter Notebook. Instead, it will show you the basics of how to use a Notebook and why it might be useful. If you are intrigued by this technology, you might want to check out my book on the topic, Jupyter Notebook 101.

Let’s get started!

Installing The Jupyter Notebook

Jupyter Notebook does not come with Python. You will need to install it using pip. If you are using Anaconda instead of the official Python, then Jupyter Notebook comes with Anaconda, pre-installed.

Here is how you would install Jupyter Notebook with pip:

python3 -m pip install jupyter

When you install Jupyter Notebook, it will install a lot of other dependencies. You may want to install Jupyter Notebook into a Python virtual environment. See Chapter 21 for more information.

Once the installation is done, you are ready to create a Jupyter Notebook!

Creating a Notebook

Creating a Notebook is a fundamental concept. Jupyter Notebook operates through its own server, which comes included with your installation. To be able to do anything with Jupyter, you must first launch this Jupyter Notebook Server by running the following command:

jupyter notebook

This command will either launch your default browser or open up a new tab, depending on whether your browser is already running or not. In both cases you will soon see a new tab that points to the following URL: http://localhost:8888/tree. Your browser should load up to a page that looks like this:

Jupyter Server Jupyter Server

Here you can create a Notebook by clicking the New button on the right:

Creating a Jupyter NotebookCreating a Jupyter Notebook

You can create a Notebook with this menu as well as a text file, a folder, and an in-browser terminal session. For now, you should choose the Python 3 option.

Having done that, a new tab will open with your new Notebook loaded:

A new NotebookA New Notebook

Now let’s learn about how to interact with the Notebook!

Naming Your Notebook

The top of the Notebook says that it is Untitled. To fix that, all you need to do is click on the word Untitled and an in-browser dialog will appear:

Renaming a NotebookRenaming a Notebook

When you rename the Notebook, it will also rename the file that the Notebook is saved to so that it matches the name you gave it. You can name this Notebook “Hello World”.

Running Cells

Jupyter Notebook cells are where the magic happens. This is where you can create content and interactive code. By default, the Notebook will create cells in code mode. That means that it will allow you to write code in whichever kernel you chose when you created the Notebook. A kernel refers to the programming language that you chose when creating your Jupyter Notebook. You chose Python 3 when you created this Notebook, so you can write Python 3 code in the cell.

Right now the cell is empty, so it doesn’t do anything at all. Let’s add some code to change that:

print('Hello from Python!')

To execute the contents of a cell, you need to run that cell. After selecting the cell, there are three ways of running it:

When you run this cell, the output should look like this:

Running a Jupyter Notebook CellRunning a Jupyter Notebook Cell

Jupyter Notebook cells remember the order in which they are run. If you run the cells out of order, you may end up with errors because you haven’t imported something in the right order. However, when you do run the cells in order, you can write imports in one cell and still use those imports in later cells. Notebooks make it simple to keep logical pieces of the code together. In fact, you can put explanatory cells, graphs and more between the code cells and the code cells will still share with each other.

When you run a cell, there are some brackets next to the cell that will fill-in with a number. This indicates the order in which the cells were run. In this example, when you ran the first cell, the brackets filled in with the number one. Because all code cells in a notebook operate on the same global namespace, it is important to be able to keep track of the order of execution of your code cells.

Learning About the Menus

There is a menu in the Jupyter Notebook that you can use to work with your Notebook. The menu runs along the top of the Notebook. Here are your menu options:

Let’s go over each of these menus. You don’t need to know about every single option in these menus to start working with Jupyter, so this will be a high-level overview.

The File menu is used for opening a Notebook or creating a new one. You can also rename the Notebook here. One of the nice features of Notebooks is that you can create Checkpoints. Checkpoints allow you to rollback to a previous state. To create a Checkpoint, go in the File menu and choose the Save and Checkpoint option.

The Edit menu contains your regular cut, copy, and paste commands, which you can use on a cell level. You can also delete, split, or merge cells from here. Finally, you can use this menu to reorder the cells.

You will find that some of the options here are grayed out. The reason an item is grayed out is because that option does not apply to the currently selected cell in your Notebook. For example, if you selected a code cell, you won’t be able insert an image. Try changing the cell type to Markdown to see how the options change.

The View menu is used for toggling the visibility of the header and the toolbar. This is also where you would go to toggle Line Numbers on or off.

The Insert menu is used for inserting cells above or below the currently selected cell.

The Cell menu is useful for running one cell, a group of cells or everything in the Notebook! You can change the cell type here, but you will probably find that the toolbar is more intuitive to use then the menu for that sort of thing.

Another useful feature of the Cell menu is that you can use it to clear the cell’s output. A lot of people share their Notebooks with others. If you want to do that, it can be useful to clear out the outputs of the cells so that your friends or colleagues can run the cells themselves and discover how they work.

The Kernel menu is for working with the Kernel itself. The Kernel refers to the programming language plugin. You will occasionally need to restart, reconnect or shut down your kernel. You can also change which kernel is running in your Notebook.

You won’t use the Kernel menu all that often. However, when you need to do some debugging in Jupyter Notebook, it can be handy to restart the Kernel rather than restarting the entire server.

The Widgets menu is for clearing and saving widget state. A Widget is a way to add dynamic content to your Notebook, like a button or slider. These are written in JavaScript under the covers.

The last menu is the Help menu. This is where you will go to learn about the special keyboard shortcuts for your Notebook. It also provides a user interface tour and plenty of reference material that you can use to learn how to better interact with your Notebook.

Now let’s learn how to create content in your Notebook!

Adding Content

You can choose between two primary types of content for your Notebooks:

There are technically two other cell types you can choose. One is Raw NBConvert, which is only intended for special use cases when using the nbconvert command line tool. This tool is used to convert your Notebook to other formats, such as PDF.

The other type is Heading, which actually isn’t used anymore. If you choose this cell type, you will receive the following dialog:

Heading TypesHeading Types

You have already seen how to use the default cell type, Code. So the next section will focus on Markdown.

Creating Markdown Content

The Markdown cell type allows you to format your text. You can create headings, add images and links, and format your text with italics, bold, etc.

This chapter won’t cover everything you can do with Markdown, but it will teach you the basics. Let’s take a look at how to do a few different things!

Formatting Your Text

If you would like to add italics to your text, you can use single underscores or single asterisks. If you would rather bold your text, then you double the number of asterisks or underscores.

Here are a couple of examples:

You can italicize like *this* or _this_

Or bold like **this** or __this__

Try setting your Notebook cell to Markdown and adding the text above to it. You will then see that the Notebook is automatically formatting the text for you:

Formatting textFormatting text

When you run the cell, it will format the text nicely:

Formatted Text (after run)Formatted Text (after run)

If you need to edit the cell again, you can double-click the cell and it will go back into editing mode.

Now let’s find out how to add heading levels!

Using Headings

Headings are good for creating sections in your Notebook, just like they are when you are creating a web page or a document in Microsoft Word. To create headings in Markdown, you can use one or more # signs.

Here are some examples:

# Heading 1
## Heading 2
### Heading 3
#### Heading 4

If you add the code above to a Markdown cell in your Notebook, it will look like this:

Markdown HeadingsMarkdown Headings

You can see that the Notebook is already generating a type of preview for you here by shrinking the text slightly for each heading level.

When you run the cell, you will see something like the following:

Markdown Headings (after running)Markdown Headings (after running)

As you can see, Jupyter nicely formats your text as different-level headings that can be helpful to structure your text.

Adding a Listing

Creating a listing or bullet points is pretty straight-forward in Markdown. To create a listing, you add an asterisk (*) or a dash (-) to the beginning of the line.

Here is an example:

* List item 1
 * sub item 1
 * sub item 2
* List item 2
* List item 3

Let’s add this code to your Notebook:

Adding Listings in MarkdownAdding Listings in Markdown

You don’t really get a preview of listings this time, so let’s run the cell to see what you get:

Listings in Markdown (after run)Listings in Markdown (after run)

That looks pretty good! Now let’s find out how to get syntax highlighting for your code!

Highlighting Code Syntax

Notebooks already allow you to show and run code and they even show syntax highlighting. However, this only works for the Kernels or languages installed in Jupyter Notebook.

If you want to show code for another language that is not installed or if you want to show syntax highlighting without giving the user the ability to run the code, then you can use Markdown for that.

To create a code block in Markdown, you would need to use 3 backticks followed by the language that you want to show. If you want to do inline code highlighting, then surround the code snippet with single backticks. However, keep in mind that inline code doesn’t support syntax highlighting.

Here are two examples in the Notebook:

Syntax Highlighting in MarkdownSyntax Highlighting in Markdown

When you run the cell, the Notebook transforms the Markdown into the following:

Syntax Highlighting (after run)Syntax Highlighting (after run)

Here you can see how the code now has syntax highlighting.

Now let’s learn how to generate a hyperlink!

Creating a Hyperlink

Creating hyperlinks in Markdown is quite easy. The syntax is as follows:

[text](URL)

So if you wanted to link to Google, you would do this:

[Google](https://www.google.com)

Here is what the code looks like in the Notebook:

Hypterlink MarkdownHyperlink Markdown

When you run the cell, you will see the Markdown turned into a regular hyperlink:

Hyperlink Markdown (after run)Hyperlink Markdown (after run)

As you can see, the Markdown has been transformed into a traditional hyperlink.

Let’s find out about Jupyter extensions next!

Adding an Extension

Jupyter Notebook has lots of functionality right out of the box. If you need anything beyond that, you can also add new features through extensions from a large extension ecosystem. There are four different types of extensions available:

Most of the time, you will want to install a Notebook extension.

An extension for Jupyter Notebook is technically a JavaScript module that will be loaded in the Notebook’s front-end to add new functionality or make the Notebook look different. If you know JavaScript, you can write your own extension!

If you need to add something new to Jupyter Notebook, you should use Google to see if someone has written something that will work for you. The most popular extension is actually a large set of extensions called jupyter_contrib_nbextensions which you can get here:

Most good extensions can be installed using pip. For example, to install the one mentioned above, you can run this command:

$ pip install jupyter_contrib_nbextensions

There are a few that are not compatible with pip. In those cases, you can use Jupyter itself to install the extension:

$ jupyter nbextension install NAME_OF_EXTENSION

While this installs the extension for Jupyter to use, it does not make the extension active yet. You will need to enable an extension if you install it using this method before you can use it.

To enable an extension, you need to run the following command:

$ jupyter nbextension enable NAME_OF_EXTENSION

If you installed the extension while you were running Jupyter Notebook, you may need to restart the Kernel or the entire server to be able to use the new extension.

You may want to get the Jupyter NbExtensions Configurator extension to help you manage your extensions. It is a neat extension designed for enabling and disabling other extensions from within your Notebook’s user interface. It also displays the extensions that you have currently installed.

Exporting Notebooks to Other Formats

After you have created an amazing Notebook, you may want to share it with other people who are not as computer savvy as you are. Jupyter Notebook supports converting the Notebooks to other formats:

You can convert a Notebook using the nbconvert tool that was installed when you originally installed Jupyter Notebook. To use nbconvert, you can do the following:

$ jupyter nbconvert <notebook file> --to <output format>

Let’s say you want to convert your Notebook to PDF. To do that, you would do this:

$ jupyter nbconvert my_notebook.ipynb --to pdf

You will see some output as it converts the Notebook into a PDF. The nbconvert tool will also display any warnings or errors that it encounters during the conversion. If the process finishes successfully, you will have a my_notebook.pdf file in the same folder as the Notebook file

The Jupyter Notebook provides a simpler way to convert your Notebooks too. You can do so from the File menu within the Notebook itself. You can choose the Download as option to do the conversion.

Depending on the platform that you are on, you may need to install LaTeX or other dependencies to get certain export formats to work properly.

Wrapping Up

The Jupyter Notebook is a fun way to learn how to use Python or machine learning. It is a great way to organize your data so that you can share it with others. You can use it to create presentations, show your work, and run your code.

In this article, you learned about the following:

You should give Jupyter Notebook a try. It’s a useful coding environment and well worth your time.

Related Articles

Learn more about what you can do with Jupyter Notebook in these articles:

The post Python 101 – An Intro to Jupyter Notebook appeared first on Mouse Vs Python.

September 19, 2021 12:30 PM UTC