Python future-fstring库 原理解析

前言

很找之前我就发现了一个 python 黑科技一样的库，没错就是兼容 python2 的 future-fstrings 实现
这个库的使用方法也是奇妙得很。

1
2
3

# -*- coding: future_fstrings -*-
thing = 'world'
print(f'hello {thing}')

只要正确安装了 future-fstrings 就可以通过 coding 的方式来兼容 python3 才可以使用 f-string
而且这个黑科技居然没有用到 C 语言的支持，完全依靠原生 python 实现的。
打开 future-fstrings 的主脚本也不到 300 行的代码
到底是怎么做到的呀~

future-fstrings 主脚本探索

打开 github 仓库可以看到除了 setup.py 用来安装的脚本之外，主脚本就只有一个 future-fstrings.py
一开始我都把目光关注到了 main 函数上了。
然而 main 函数貌似只是个命令行调用工具，如果在 python 中直接运行这个脚本是会报错的，要求我传入参数。
很显然脚本不应该是怎么用的。
于是我继续往上面代码翻阅，

fstring_decode = decode
SUPPORTS_FSTRINGS = _natively_supports_fstrings()
if SUPPORTS_FSTRINGS:  # pragma: no cover
    decode = utf_8.decode  # noqa
    IncrementalDecoder = utf_8.incrementaldecoder  # noqa
    StreamReader = utf_8.streamreader  # noqa

# codec api

codec_map = {
    name: codecs.CodecInfo(
        name=name,
        encode=utf_8.encode,
        decode=decode,
        incrementalencoder=utf_8.incrementalencoder,
        incrementaldecoder=IncrementalDecoder,
        streamreader=StreamReader,
        streamwriter=utf_8.streamwriter,
    )
    for name in ('future-fstrings', 'future_fstrings')
}

def register():  # pragma: no cover
    codecs.register(codec_map.get)

我发现这里运行的代码其实都是一些常亮的声明，并没有找到脚本的入口函数的位置(:з」∠)
并且上面的字典运用也是震惊到我了，没想到 python 的字典还可以这么写的。
这么写获取到的 codec_map 打印出来如下

1 2	print codec_map {u'future-fstrings': <codecs.CodecInfo object for encoding future-fstrings at 0x266d888>, u'future_fstrings': <codecs.CodecInfo object for encoding future_fstrings at 0x2cba4c8>}

这个操作不仅仅是给字典赋予了 key 值， key 值还传入到 CodecInfo 的类声明参数里面。
可以通过这个方法快速生成不同传参的实例

好的，我们差点扯远了，目前需要找到的 future-fstrings 的入口函数
然而只是从主脚本来看除了 main 函数之外就没有其他入口函数了。
后来我仔细阅读 future-fstrings 的 readme 文档，里面提到 A .pth file which registers a codec on interpreter startup.
这个 .pth 文件是什么东西来的呢？
于是网上查了一下，原来这个 .pth 文件里面存储的 python 代码可以在 python.exe 运行的时候立刻执行，
通过这个方法可以实现一些函数的初始化。

不过 github 仓库里面并没有 .pth 的文件，于是我去看了 setup.py 的代码
果然找到了 python 代码生成 .pth 的代码

PTH = (
    'try:\n'
    '    import future_fstrings\n'
    'except ImportError:\n'
    '    pass\n'
    'else:\n'
    '    future_fstrings.register()\n'
)

也可以去到 python 的 site-packages 可以找到名字为 aaaaa_future_fstrings.pth 的文件 (这名字起得也是够随意的)

1	import sys; exec('try:\n import future_fstrings\nexcept ImportError:\n pass\nelse:\n future_fstrings.register()\n')

现在就找到入口的代码了，每当 python.exe 运行的时候都会执行 register 方法的

1 2	def register(): # pragma: no cover codecs.register(codec_map.get)

而 register 方法执行的是 codecs.register 方法，这个貌似是定义了一个新的编码格式。

codecs 库研究

codecs 库 python 原生的字符编码处理库，查阅了官方文档可以知道python支持自定义编码的扩展的。
不过具体要怎么使用，还是得找 Stack Overflow 来解决。链接

Stack Overflow 的代码是针对 python3 写的，我改写了一个 python2 的版本

import codecs
import string

# NOTE https://stackoverflow.com/questions/38777818/how-do-i-properly-create-custom-text-codecs

# prepare map from numbers to letters
_encode_table = {str(number): bytes(letter) for number, letter in enumerate(string.ascii_lowercase)}

# prepare inverse map
_decode_table = {v: k for k, v in _encode_table.items()}


def custom_encode(text):
    # example encoder that converts ints to letters
    # see https://docs.python.org/3/library/codecs.html#codecs.Codec.encode
    return b''.join(_encode_table[x] for x in text), len(text)


def custom_decode(binary):
    # example decoder that converts letters to ints
    # see https://docs.python.org/3/library/codecs.html#codecs.Codec.decode
    return ''.join(_decode_table[x] for x in binary), len(binary)


def custom_search_function(encoding_name):
    return codecs.CodecInfo(encode=custom_encode, decode=custom_decode, name='Reasons')


def main():

    # register your custom codec
    # note that CodecInfo.name is used later
    codecs.register(custom_search_function)

    binary = 'abcdefg'
    # decode letters to numbers
    text = binary.decode('Reasons')
    print(text)
    # 0123456

    # encode numbers to letters
    binary2 = text.encode('Reasons') 
    print(binary2)
    # abcdefg

if __name__ == '__main__':
    main()

通过上面的例子就可以自定义自己的编码处理方式，将字符串转换成其他的字符串。
通过 encode 将字符神奇地转换为了数字的形式。

通过这个简单的例子我们可以大致窥探到 future-fstrings 的实现原理。
我们也可以用下面的例子来验证我们的想法。

text = "world"
fstring = 'f"hello {text}"'.decode('future-fstrings')
print fstring
# "hello {}".format((text))

可以看到 future-fstrings 其实就是将 f-string 形式的代码转换为了 format 的方式。
所以 futre-strings 毕竟是没有使用 C 语言的方式实现，运行效率肯定不可能达到 python3 的速度了。
但是能够实现代码转换的方式也着实让我震惊不已。

我之前研究过 pdb 代码追踪的过程，我知道如果是在 pyc 之类的文件上是实现不了源码的追踪的(除非源码还放在pyc的同级目录下)
因为 pyc 已经转换为了 python 的字节码，可以通过 dis 库看到里面类似汇编语言的操作。
于是我就开始疑虑，这种字符替代的方式是需要读取源码的，那恐怕 pyc 文件里面就不奏效了。

然而我测试了官方的 coding 放在开头的用啊，future-fstrings 完全可以在 pyc 里面使用，不受影响。
于是我就开始好奇了，全局代码替换的方式是怎么实现的。

coding声明全局替换的实现

最初我是打算使用断点测试研究 coding声明设置背后的原理的
但是这个 coding声明必须放到最上面根本就没办法启动 pdb
不过使用 encode 方式的话倒是可以通过 pdb 追踪到 future-fstrings 的脚本。
代码会自己执行到 decode 函数中并且将相关的字符串传到函数的参数里
于是我还是用传统的 print 方式测试打印
然后就一目了然了。

如果添加了头声明，decode 传入了整个文件的源码。
然后future-strings 用各种字符的操作将所有的 f-string 转换为了 python 的 format 字符处理形式。
然后编码转换过程结束了之后， python 才会编译源码进行执行。
这也解释了为啥 pyc 执行不受影响。
因为 pyc 文件里面封存的是字节码也就是先完成了编码之后再编译出来的。
所以通过 pyc 执行的时候，即便用于编译的 py 文件是有 coding 声明的，但是 pyc 却不会触发到 future-fstrings 的 decode 函数。

这个就是 future-fstrings 的原理~

总结

的确理解了 codec 的方式之后，通过 codec 可以实现很多黑科技，毕竟是直接对源码进行的操作。
而且通过 python 逻辑可以知道 pyc pyd 等等编译方案都不受影响的。
唯一的缺点就是 python 的 coding 只能接受一个，如果使用了 future-fstrings 的编码就意味着无法使用其他的编码格式了。