创建免费ip代理池 - HelloWorld开发者社区

反爬技术越来越成熟，为了爬取目标数据，必须对爬虫的请求进行伪装，骗过目标系统，目标系统通过判断请求的访问频次或请求参数将疑似爬虫的ip进行封禁，要求进行安全验证，通过python的第三方库faker可以随机生成header伪装请求头，并且减缓爬虫的爬取速度，能很好的避过多数目标系统的反扒机制，但对一些安全等级较高的系统，也有极大的可能ip被封禁，当ip被封禁后，通过更换代理ip便可以继续爬取，所以具有一个有效的ip代理池是非常重要的，网上有很多动态ip代理提供商，但如果能有一个自己免费的ip代理池也是不错的选择。关注微信公众号【菜鸟阿都】并回复：ip池，可获得源码。

ip代理池开发思路：

通过爬虫技术爬取互联网上免费的ip
通过校验，将有效的ip保存

免费ip代理商： |ip提供商|url| |-|-|-| |快代理|https://www.kuaidaili.com/free/inha| |89免费代理|https://www.89ip.cn/index_1.html| |高可用全球免费代理ip库|https://ip.jiangxianli.com/| |66代理|http://www.66ip.cn/2.html|

将请求进行封装，当请求失败后，会停留3秒再次发起请求，总共请求3次，使用faker库，随机生成请求头

# get请求：链接异常后，会多次进行链接尝试
def GetConnect(url):
    i=0
    while i<3:
        try:
            headers= {'User-Agent':str(UserAgent().random)}
            response=requests.get(url,headers=headers,timeout=5)
            if(response.status_code==200):
                return response
        except requests.exceptions.RequestException as e:
            time.sleep(3)
            i+=1

获取网页提供的ip，总共爬取上述5个提供商提供的免费ip，页面数据为表格，所以通过xpath定位表格爬取数据

def getDate():
    for i in range(0,len(urlNode)):
        for j in range(1,10):n
            url=urlNode[i].replace('@',str(j))
            print(url)
            response=GetConnect(url)
            content=response.text
            html=etree.HTML(content)
            tr=html.xpath('//tr')
            for j in range(1,len(tr)+1):
                ip=html.xpath('//tr['+str(j)+']/td[1]/text()')
                port=html.xpath('//tr['+str(j)+']/td[2]/text()')
                ipType=html.xpath('//tr['+str(j)+']/td[4]/text()')
                # 66ip第一行为表头
                if len(ip)>1:
                    continue
                if len(ipType)==0 or not ipType[0].isalpha():
                    ipType='HTTP'
                else:
                    ipType=ipType[0]
                if len(ip)!=0 and len(port)!=0:
                    checkIp(wash(ip[0])+':'+wash(port[0]),wash(ipType))

通过ip代理请求，访问icanhazip网址校验ip的有效性

# 校验ip有效性
def checkIp(ip,ipType):
    url='http://icanhazip.com/'
    try:
        headers= {'User-Agent':str(UserAgent().random)}
        proxy = {
          ipType.lower():ipType.lower()+'://'+ip
        }
        response=requests.get(url,headers=headers,proxies=proxy,timeout=5)
        if(response.status_code==200):
            # 有效ip
            write(ip,ipType)
    except Exception as e:
        # 无效

将有效的ip写入文件，以供爬虫使用

def write(ip,ipType):
    with open("ip池.txt", "a", encoding="utf-8") as f:
        f.write(wash(ip)+' '+wash(ipType)+'\n')

2. 插件式换肤框架搭建 - Hook拦截View的创建

手撕Java线程池源码

热门文章