Generators

journey into python

Helpful links

Generators are a unique sort of function that will return what is called a lazy iterator. Lazy iterators are items that you may loop over like a list. But, unlike lists, lazy iterators do not save their contents to your system’s memory. Maybe have a complicated function that needs to remember where it left off each time it’s called. Or, Have you ever had to analysis data so huge that your FPS dropped to 0? This is where Generators and Python yield statement come into play. If you’re wanting to optimize your data functions, than you are in the right place. Do you want a better understanding of iterators in Python. Then take a look at our posts Python For Loops and Python While Loops at Introduction to Python.

What is a Generator?

Generators allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop. But, will not save their contents to your system’s memory. Definition comes from Python.org with some additional reading.

Course Objectives:

  • Learn to use Generators on large files
  • Make an infinite loop
  • Test either a generator or interator should be used
  • More on the yield statement
  • Learn a few generator methods
  • Cheat Sheet

Learn to use Generators on large files

With a better understanding of generators. Lets take a look at the first example below. If we wanted to find out the number of lines in this Python file. Then we would interate over the file and find each row and print the total.

csvFile = open("sample_data/california_housing_test.csv")
rowCount = 0

for row in csvFile:
    rowCount += 1

print(f"Row count is {rowCount}")
Row count is 3001

But what if we were needing to iterate over a larger file that has an unditermend amount of rows. If you try to iterate over large files you will see that is takes time for it to run. Sometimes even cause your computer to hang up. So how do we iterate over a large file then? The answer is generators using the yield statement.

def csvReader(file_name):
    thefile = open(file_name)
    result = thefile.read()
    return result

csvFile = csvReader("sample_data/california_housing_train.csv")
rowCount = 0

for row in csvFile:
    rowCount += 1

print(f"Row count is {rowCount}")
Row count is 1706430

The example below uses the generator and yield statement to accomplish the same iteration as above. First you will notice that the generator has to be in a function for the yield statement to work. Additionally the amount of memory used is a lot less. This is because as soon as the function gets the result it gives the result and continues. As with a return statement the code would iterate over the file, store the date, then give the results and stop.

def csvReader(file_name):
    for row in open(file_name, "r"):
        yield row

csvFile = csvReader("sample_data/california_housing_train.csv")
rowCount = 0

for row in csvFile:
    rowCount += 1

print(f"Row count is {rowCount}")
Row count is 17001

In the same way as you can do other comprehensions, so can you with a generator. This can keep you from having to create a function to iterate over a file. The example function above can also be writen as one line.

csv_gen = (row for row in open("sample_data/california_housing_train.csv"))

Make an infinite loop

Lets take a break from iterating files and dig a little more into generators. Lets create a function generator that will for sure run forever unless you stop it. With that being said lets see what happens with the return statement and then the yield statement.

def function_infinite_sequence():
  num = 0
  while True:
    return num
    num += 1

print(type(function_infinite_sequence()))
for i in function_infinite_sequence():
  print(i)
<class 'int'="">

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

 in ()
      6 
      7 print(type(function_infinite_sequence()))
----> 8 for i in function_infinite_sequence():
      9   print(i)

TypeError: 'int' object is not iterable
def generator_infinite_sequence():
  num = 0
  while True:
    yield num
    num +=1

print(type(generator_infinite_sequence()))
for i in generator_infinite_sequence():
  print(i)

What’s the difference between return and yield

From the examples above the first one is a function with a return statement. As well as the second one is a function with a yield statement. Notice the placement of the return and yield statements are in the same place. Though the return function just returns an integar and the yield creates a generator. You also can’t use the for loop with the return statement because the return exits the loop. You can also use next() to step through the generator iteration for testing if needed with the yield statement. This will give you whatever value is in the yield statement at that time.

step = generator_infinite_sequence()
next(step)
0
next(step)
1

Test whether a generator or interator should be used

Sometimes when you are creating iterators you can choose between speed and memory usage. In the example below we have a list comprhesion and a generator comprehesion. One will create a list and the other will be a generator object. With that information you can check which one is bigger in size and its speed. This can be done by importing sys and cProfile to check this vaulable information.

nums_squared_lc = [num**2 for num in range(5)]
nums_squared_gc = (num**2 for num in range(5))
print(nums_squared_lc)
print(nums_squared_gc)
[0, 1, 4, 9, 16]
 at 0x7f8552809dd0>
import sys
import cProfile

nums_squared_lc = [num**2 for num in range(10000)]
nums_squared_gc = (num**2 for num in range(10000))
print(sys.getsizeof(nums_squared_lc))
print(sys.getsizeof(nums_squared_gc))
87632
128
print(cProfile.run('sum([i * 2 for i in range(10000)])'))
print(cProfile.run('sum((i * 2 for i in range(10000)))'))
         5 function calls in 0.002 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.001    0.001 :1()
        1    0.000    0.000    0.002    0.002 :1()
        1    0.000    0.000    0.002    0.002 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

None
         10005 function calls in 0.003 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10001    0.002    0.000    0.002    0.000 :1()
        1    0.000    0.000    0.003    0.003 :1()
        1    0.000    0.000    0.003    0.003 {built-in method builtins.exec}
        1    0.001    0.001    0.003    0.003 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

None

More on the yield statement

When using the yield statement you are basicly controlling the iteration steps in the generator. The generator will run the yield statement and suspend the iteration to return the yield value. Once the yield value comes back the iteration will contiune unlike a return statement. When a generator suspends, the state of the function is saved. This includes any variables local to the generator, the location in your code, and any exception handling. Though once you have iterated over all values your generator will return a StopIteration exception.

def youryield():
  yieldline = "This will print the first string"
  yield yieldline
  yieldline = "This will print the second string"
  yield yieldline

twoyields = youryield()
print(next(twoyields))
This will print the first string
print(next(twoyields))
This will print the second string
print(next(twoyields))
---------------------------------------------------------------------------

StopIteration                             Traceback (most recent call last)

 in ()
----> 1 print(next(twoyields))

StopIteration: 

Learn a few generator methods

Text

Cheat Sheet Python TITLE:

Syntax

Functions

Common Mistakes

Typical Use

Leave a Comment