Matthew Note

HTTP Notes

Transfer-Encoding

通常来说,HTTP 1.0 默认都不是持续连接,这使得判断HTTP包结束变得非常容易,只要连接关闭了,就可以作为结束的标志。但是这会带来很多额外的开销。所以HTTP 1.1把keep-alive作为了默认项,所以通常我们需要一个Content-Length来表示包的长度,以便于正确的判断包的边界,这样带来的问题是,如果一个文件非常大,那么只有能所有的文件都被下载完成才会开始显示给用户,这样延迟可能非常大,所以采用了另外一种办法Transfer-Encoding,与之相类似的还有Content-Encoding

  • Transfer-Encoding: 对于传输的定义,目前只有chunked一种类型,表示分块
  • Content-Encoding:对于内容的编码,比如压缩什么的

如果使用了chunked方式,那么HTTP报文要遵循如下规则:

  1. 在头部加入 Transfer-Encoding: chunked 之后,就代表这个报文采用了分块编码。
  2. 报文中的实体需要改为用一系列分块来传输。
  3. 每个分块包含十六进制的长度值和数据,长度值独占一行,长度不包括它结尾的 CRLF(\r\n),也不包括分块数据结尾的 CRLF
  4. 最后一个分块长度值必须为 0,对应的分块数据没有内容,表示实体结束。

例如:

1
2
3
4
5
6
7
8
9
HTTP/1.1 200 OK
Server: nginx/1.1.19
Date: Mon, 13 Jul 2015 06:58:31 GMT
Transfer-Encoding: chunked
Connection: keep-alive
2f
{"connectionId": "qwertyu23456", "version": 2}
0

与之对应,如果采用Content-Encoding

1
2
3
4
5
6
7
8
HTTP/1.1 200 OK
Server: nginx/1.1.19
Date: Mon, 13 Jul 2015 07:07:42 GMT
Content-Type: vnd.collection+json
Content-Length: 47
Connection: keep-alive
{"connectionId": "qwertyu23456", "version": 2}

GET with body message

通常来讲,GET方法的body是空的,但是这并不是一个强制标准,只是一个习惯的用法,有些语义下,一个带有Body的GET方法是符合情理的,但是可能所用的web框架不支持,甚至curl也不支持,所以要手写内容检查方法,比如:

1
2
3
4
5
6
def check_body(req):
msg_body = req.json
for itr in msg_body:
if check(itr) == False:
abort(404)
return msg_body

这里要注意,GET with body 不能够留下任何状态,必须是等幂的,不然违背GET的原则

web.py

FastCGI/WSGI各个参数在web.ctx中的解释,我遇到的问题是关于homepath,如果在nginx的location中配置了/api/hello,而在webpy中配置了("/api/hello","hello"), 那么webpy中homepath会是/api/home 而path是\,这显然会匹配失败,正确的做法是("","hello")

  • environ a.k.a. env – a dictionary containing the standard WSGI environment variables
  • home – the base path for the application, including any parts “consumed” by outer applications http://example.org/admin
  • homedomain – ? (appears to be protocol + host) http://example.org
  • homepath – The part of the path requested by the user which was trimmed off the current app. That is homepath + path = the path actually requested in HTTP by the user. E.g. /admin This seems to be derived during startup from the environment variable REAL_SCRIPT_NAME. It affects what web.url() will prepend to supplied urls. This in turn affects where web.seeother() will go, which might interact badly with your url rewriting scheme (e.g. mod_rewrite)
  • host – the hostname (domain) and (if not default) the port requested by the user. E.g. example.org, example.org:8080
  • ip – the IP address of the user. E.g. xxx.xxx.xxx.xxx
  • method – the HTTP method used. E.g. GET
  • path – the path requested by the user, relative to the current application. If you are using subapplications, any part of the url matched by the outer application will be trimmed off. E.g. you have a main app in code.py, and a subapplication called admin.py. In code.py, you point /admin to admin.app. In admin.py, you point /stories to a class called stories. Within stories, web.ctx.path will be /stories, not /admin/stories. E.g. /articles/845
  • protocol – the protocol used. E.g. https
  • query – an empty string if there are no query arguments otherwise a ? followed by the query string. E.g. ?fourlegs=good&twolegs=bad
  • fullpath a.k.a. path + query – the path requested including query arguments but not including homepath. E.g. /articles/845?fourlegs=good&twolegs=bad

跨域

  • form是不受跨域影响的
  • xmlhttprequest受跨域影响,不过可以通过Access-Control-Allow-Origin来突破W3C document