python-迭代器和生成器

迭代器协议(Iterator Protocol)

  • 迭代器是⼀个对象
  • 迭代器可以被 next() 函数调⽤,并返回⼀个值
  • 迭代器可以被 iter() 函数调⽤,并返回迭代器⾃⼰
  • 连续被 next() 调⽤时依次返回⼀系列的值
  • 如果到了迭代的末尾,则抛出 StopIteration 异常
  • 迭代器也可以没有末尾,只要被 next() 调⽤,就⼀定会返回⼀个值
  • python中, next() 内置函数调⽤的是对象的 __next__() ⽅法
  • python中, iter() 内置函数调⽤的是对象的 __iter__() ⽅法
  • ⼀个实现了迭代器协议的的对象可以被 for 语句循环迭代直到终⽌

Example-1

只要一个对象实现了 __next__() 方法,就可以被 next() 函数调用了

1
2
3
4
5
6
7
8
9
10
11
12
class XIterator:
def __next__(self):
return "Hello World"


def main():
x_it = XIterator()
[print(next(x_it)) for i in range(3)]


if __name__ == "__main__":
main()
Hello World
Hello World
Hello World

Example-2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class XIterator:
def __init__(self):
self.elements = list(range(5))

def __next__(self):
if self.elements:
return self.elements.pop()


def main():
x_it = XIterator()
[print(next(x_it)) for i in range(7)]

# 没有实现 `__iter__()` 方法,用 `for` 语句迭代会报错
# for it in x_it:
print(it)


if __name__ == "__main__":
main()
4
3
2
1
0
None
None



---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-9-9e0aab8f9512> in <module>()
     18 
     19 if __name__ == "__main__":
---> 20     main()


<ipython-input-9-9e0aab8f9512> in main()
     13 
     14     # 没有实现 `__iter__()` 方法,用 `for` 语句迭代会报错
---> 15     for it in x_it:
     16         print(it)
     17 


TypeError: 'XIterator' object is not iterable

for 语句的内部实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
for element in iterable:
# do something with element

===========内部实现==============

# create an iterator object from that iterable
iter_obj = iter(iterable)

# infinite loop
while True:
try:
# get the next item
element = next(iter_obj)
# do something with element
except StopIteration:
# if StopIteration is raised, break from loop
break

说明:

  • for 语句⾥⽤的是 iterable ,⽽⾮ iterator
  • for 语句执⾏的第⼀个操作是从⼀个 iterable ⽣成⼀个 iterator
  • for 语句的循环体其实是靠检测 StopIteration 异常来中断的
  • 要想被 for 语句迭代需要三个条件: __iter__() __next__() StopIteration

如果我们可以从⼀个对象⾥获得⼀个迭代器(Iterator),那么这个对象就是可迭代对象(Iterable),迭代器都是可迭代对象(因为实现了 iter() ),但可迭代对象不⼀定是迭代器,详情点击链接:python之Iterable与Iterator

Example-3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class XIterator:
def __init__(self):
self.elements = list(range(5))

def __next__(self):
if self.elements:
return self.elements.pop()
else:
raise StopIteration

def __iter__(self):
return self


def main():
x_it = XIterator()

for it in x_it:
print(it)


if __name__ == "__main__":
main()
4
3
2
1
0

生成器(Generator)

  • 迭代器协议很有⽤,但实现起来有些繁琐
  • ⽣成器在保持代码简洁优雅的同时,⾃动实现了迭代器协议

实现生成器的⽅式一: yield Expression

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def f():
yield 1
yield 2
yield 3


def main():
f_gen = f()

for x in f_gen:
print(x)


if __name__ == "__main__":
main()
1
2
3

实现⽣成器的⽅式⼆: Generator Expression

1
2
3
4
5
g = (x ** 2 for x in range(5))
print(g)

for x in g:
print(x)
<generator object <genexpr> at 0x000001DD39B98FC0>
0
1
4
9
16
1
2
sum([x ** 2 for x in range(10000000)]) # 内存占用很大
sum(x ** 2 for x in range(10000000)) # 几乎不占用内存
333333283333335000000

为什么需要⽣成器

  1. 相⽐迭代器协议,实现⽣成器的代码量⼩,可读性更⾼
  2. 相⽐在 List 中操作元素,直接使⽤⽣成器能节省⼤量内存
  3. 有时候我们会需要写出⼀个⽆法在内存中存放的⽆限数据流
  4. 你可以建⽴⽣成器管道(多个⽣成器链式调⽤)

用生成器表示全部的斐波那契数列

1
2
3
4
5
def fibonacci():
temp = [1, 1]
while True:
temp.append(sum(temp))
yield temp.pop(0) # len(temp) 始终等于2

通过⽣成器管道模块化处理数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def fibonacci():
temp = [1, 1]
while True:
temp.append(sum(temp))
yield temp.pop(0) # len(temp) 始终等于2


def dataflow():
for x in fibonacci():
yield x ** 2


if __name__ == "__main__":
for x in dataflow():
print(x)
if x > 100:
break
1
1
4
9
25
64
169