时间: 2020-09-7|tag:61次围观|0 条评论

构造 Request Headers

https://curl.trillworks.com/

通过 fake_useragent 生成随机UserAgent

from fake_useragent import UserAgentua = UserAgent(verify_ssl=False)headers = {"User-Agent": ua.random}

robots.txt

有些网站的robots.txt文件会保存一些sitemap,可能会有你想要的数据
例:https://www.douban.com/robots.txt 下的sitemap保存了豆瓣头一天新产生的影评,书评,帖子等等

编辑cookie

EditThisCookie

Python 爬虫技巧插图
image

爬虫和反爬虫就是一场没有硝烟的拉锯战,你永远不知道对方会给你埋哪些坑,比如对Cookies动手脚。这个时候你就需要它来辅助你分析,通过Chrome安装EditThisCookie插件后,我们可以通过点击右上角小图标,再对Cookies里的信息进行增删改查操作,大大提高对Cookies信息的模拟

自动生成selenium代码

用Chrome的插件Katalon Recorder

Python 爬虫技巧插图1
image.png

Python 爬虫技巧插图2
image.png

自动转化headers

from copyheaders import headers_raw_to_dictheaders = b'''    :authority:c.y.qq.com    :method:GET    :path:/soso/fcgi-bin/client_search_cp?ct=24&qqmusic_ver=1298&new_json=1&remoteplace=txt.yqq.center&searchid=46360413927906065&t=0&aggr=1&cr=1&catZhida=1&lossless=0&flag_qc=0&p=1&n=20&w=%E6%98%8E%E5%A4%A9%E4%BD%A0%E5%A5%BD&g_tk=5381&jsonpCallback=MusicJsonCallback7934911028613236&loginUin=0&hostUin=0&format=jsonp&inCharset=utf8&outCharset=utf-8¬ice=0&platform=yqq&needNewCode=0    :scheme:https    accept:*/*    accept-encoding:gzip, deflate, sdch, br    accept-language:zh-CN,zh;q=0.8    cookie:cuid=6852877350; pgv_pvi=6596119552; RK=xB5dmM0g81; tvfe_boss_uuid=622f2b2912bb7f83; o_cookie=2353184487; ts_refer=www.baidu.com/link; ptcz=410ebd7ac68d0a114d731d573a83ff7f6572ed57fa43d90ad9ab90c7205751d8; pt2gguin=o2353184487; pgv_si=s6436702208; yplayer_open=1; yq_index=0; qqmusic_fromtag=66; yqq_stat=0; pgv_info=ssid=s4116171870; ts_last=y.qq.com/portal/search.html; pgv_pvid=2839864484; ts_uid=2016409769; player_exist=1    referer:https://y.qq.com/portal/search.html    user-agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4549.400 QQBrowser/9.7.12900.400    '''headers = headers_raw_to_dict(headers)print(headers)

文章转载于:https://www.jianshu.com/p/8238d269f8b4

原著是一个有趣的人,若有侵权,请通知删除

本博客所有文章如无特别注明均为原创。
复制或转载请以超链接形式注明转自起风了,原文地址《Python 爬虫技巧
   

还没有人抢沙发呢~