我正在编写一个脚本,以转换为AMP的HTML.并且有这样的代码:
#!/usr/bin/python3
import argparse
from amp_tools import TransformHtmlToAmp
import codecs
arg_parser = argparse.ArgumentParser( description = "Copy source_file as target_file." )
arg_parser.add_argument( "source_file" )
arg_parser.add_argument( "target_file" )
arguments = arg_parser.parse_args()
source = arguments.source_file
target = arguments.target_file
html = ""
with codecs.open(source, encoding='utf-8', mode='r+') as f:
for line in f:
html = html + line.rstrip()
valid_amp = str(TransformHtmlToAmp(html)())
with codecs.open(target, encoding='utf-8', mode='w+') as f:
f.write(valid_amp.rstrip())
f.seek(0)
#print(str(valid_amp))
print( target, "successfully created !!" )
现在,这是可行的,但文件被保存在b''
中.我不想这样.有没有办法避免在输出文件中使用引号?
样例输入:<!doctype html> <html lang="en"> <head> <title>News Article</title> <link href="base.css" rel="stylesheet" /> <script type="text/javascript" src="base.js"></script> </head> <body> <header> News Site </header> <article> <h1>Article Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam egestas tortor sapien, non tristique ligula accumsan eu.</p> </article> <img src="https://www.travelmanagers.com.au/wp-content/uploads/2012/08/AdobeStock_254529936_Railroad-to-Denali-National-Park-Alaska_750x500.jpg"> </body> </html>
输出:b'<div lang="en" class="amp-text"> <head> <title>News Article</title> <link href="base.css" rel="stylesheet"> <script type="text/javascript" src="base.js"></script> </head> <body> <header> News Site </header> <article> <h1>Article Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam egestas tortor sapien, non tristique ligula accumsan eu.</p> </article> <amp-img src="https://www.travelmanagers.com.au/wp-content/uploads/2012/08/AdobeStock_254529936_Railroad-to-Denali-National-Park-Alaska_750x500.jpg" width="750" height="500" layout="responsive"></amp-img> </body></div>'