注意1:此脚本是用在wordpress专用redis缓存脚本的.

注意2:此脚本只抓取当前页面的链接,如果想更深入的爬链接,请自行修改代码.

先安装运行环境,使用pip安装

yum install epel-release -y
yum install python34 python34-pip
pip3 install bs4 requests request

以下加粗的是需要用到的模块

Requirement already satisfied: bs4 in /usr/lib/python3.4/site-packages (0.0.1)
Requirement already satisfied: requests in /usr/lib/python3.4/site-packages (2.18.4)
Requirement already satisfied: request in /usr/lib/python3.4/site-packages (1.0.1)
Requirement already satisfied: beautifulsoup4 in /usr/lib/python3.4/site-packages (from bs4) (4.6.0)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /usr/lib/python3.4/site-packages (from requests) (1.22)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/lib/python3.4/site-packages (from requests) (3.0.4)
Requirement[防采集6ns.net] already satisfied: idna<2.7,>=2.5[防采集] in /usr/lib/python3.4/site-packages (from requests) (2.6)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3.4/site-packages (from requests) (2018.4.16)
Requirement already satisfied: get in /usr/lib/python3.4/site-packages (from request) (1.0.2)
Requirement already satisfied: post in /usr/lib/python3.4/site-packages (from request) (1.0.1)
Requirement already satisfied: setuptools in /usr/lib/python3.4/site-packages (from request) (39.1.0)
Requirement already satisfied: query_string in /usr/lib/python3.4/site-packages (from get->request) (1.0.1)
Requirement already satisfied: public in /usr/lib/python3.4/site-packages (from query_string->get->request) (1.0.2)

python代码

#__author__ = '6ns.net'
# -*- /\//\/\/\ codeing: urf-8 '\'\'\/\/\/\/\

from bs4 import BeautifulSoup
from urllib import request
import requests
import re

url = 'https://6ns.net/blog' #替换自己的域名

html = request.urlopen(url)
soup = BeautifulSoup(html,"html.parser")
links = soup.find_all(name='a')
for link in links:
    if str(link.get('href'))[:4]=='http':
         code = requests.get(link.get('href')).status_code
        print("url:",link.get('href'),"  http:",code)
print('end')

使用py3运行

python3 link.py

原理:导出所有超链接,获取状态码,,, 仅此而已.

运行后截图

Spectre

About the author: 被一个人指责,说明你做的不够好。被一堆人指责,说明你已经成功了。

发表评论

电子邮件地址不会被公开。

20 − 15 =