如何快捷的收集活动帖子做成汇总？做版主的请进。 nearby 文学城

nearby2022-04-10 23:24:23

应妖妹邀请，虎哥将这个程序修改，打出来的正是妖妹需要的格式。妖妹别怕，绝不能手动收集活动帖子，辛苦得要命。使用这个修改后的python程序，活动帖子的收集基本全自动.

You can copy/paste the codes below into a Python program. If you have Python 3 installed on your computer, you can then follow the prompted instructions to make 活动帖子的收集基本全自动.

庆贺妖妹的活动大获成功！

# Author: 书香之家版主 nearby, March 2022
#
# Usage of this Python program:
# 0. Make sure that you have Internet access and Python 3 installed on your computer (or use Cloud)!
# 1. Place this file in a folder. Say, in a folder named "wxc"
# 2. Go to your '论坛', search for your '活动' title. You will get one or more pages. Remember how many pages there are.
#       If you do not know how to do this, just skip this step, I will then assume that there are 3 pages (150 entries, which is more than usual)
# 3. execute this program, you will be prompted (asked for) the name of your activity, and
#    the number of pages you obtained in step 2 (if you do not know the number of pages, just hit ENTER)
#    Example:
#               春天的畅想
#               3 (or Hit ENTER key)
# 4. You will also be prompted for your 论坛's name in alphabets/English. You can look up this in your 论坛.
#    For example, 书香之家 has the URL https://bbs.wenxuecity.com/sxsj/, so its English name is sxsj.
#    Other examples include: 美语世界 is mysj, 文化走廊 is culture, 诗词欣赏 is poetry, etc.
# 5. The result is stored inside 'wxc/sxzj-out.html'. You can then copy/paste the source code of 'sxzj-out.html' into your WXC new page. Done!
#
#
# Note: By default the entries are organized in reverse chronological order.
# Should you need them to be placed in chronological order, please do:
# Comment out the statement: mylist.reverse() by placing # in front of it, like: #mylist.reverse()
#
#

import requests


notargets = ['跟帖', '输入关键词', '内容查询', 'input name', '当前', '首页', '上一页', '尾页', '下一页']
notargets.append('archive')
# This is how SXZJ (书香之家) works. When 无忧 starts an activity, she always marks her activity like this.
notargets.append('##活动##')
# notargets.append('汇总')


def isInside(line, notargets_array):
    for t in notargets_array:
        if t in line:
            return True
    return False
# END

# the line looks like <a href="/sxsj/76799.html" target="_blank">【<em>春天的畅想</em>】春天属于女人</a>
# I need it to be <a href="https://bbs.wenxuecity.com/sxsj/76799.html" target="_blank">【<em>春天的畅想</em>】春天属于女人</a>
def addHttp(line):
    at = line.split('href="')
    line2 = '<a href="https://bbs.wenxuecity.com' + at[1]
    return line2
# END

def processOneFile(target, html, mylist):
    # split the text by newline character to get an array of string
    all = html.text.split('\n')
    length = len(all)
    i = 0
    while i < length:
        line = all[i]
        if (target in line) and (not isInside(line, notargets)):
            line = addHttp(line)
            print(line)
            i = i + 1
            line2 = all[i]
            # look like: [书香之家] - <strong>WXCTEATIME</strong>(6987 bytes ), need to be WXCTEATIME only
            line2 = line2.replace('</strong>', '<strong>').split('<strong>')[1]
            line += "  " + line2
            mylist.append(line)
        i = i + 1
# END of FUNCTIONS


# ---- main starts here ----

print()
print('# Author: 书香之家版主 nearby, March 2022')
print()

target = input('What is the title of your activity (活动)?:  ')
pages = 3 # default, means there are maximum 150 entries
temp = input('How many pages there are when you search for the activity in WXC? (If you do not know, just Hit ENTER): ')
if temp != '':
    pages = int(temp)

subid = 'sxsj'
temp = input('What is the name of your 论坛 in English? For example, 书香之家 is sxsj, 美语世界 is mysj, 文化走廊 is culture, 诗词欣赏 is poetry: ')
if len(temp) >= 2:
    subid = temp

mylist = []
# this is the output file.
html2 = open('sxzj-out.html', 'w', encoding='utf-8')

url = 'https://bbs.wenxuecity.com/bbs/archive.php?SubID='+subid+'&pos=bbs&keyword=' + target + '&username='

f = requests.get(url)
processOneFile(target, f, mylist)
for i in range(1, pages):
    url = 'https://bbs.wenxuecity.com/bbs/archive.php?page=' + str(i) + '&SubID=' + subid +'&pos=bbs&keyword=' + target + '&username='
    f = requests.get(url)
    processOneFile(target, f, mylist)

mylist.reverse()
for li in mylist:
    html2.write("<p>" + li+"\n")
html2.close()

print("\n")
print(str(len(mylist)) + " entries")
print("\n")
print("Please check the file sxzj-out.html. The result is in it! Thanks for using this program. ---- 虎哥 / Nearby ")

妖妖灵2022-04-10 23:28:59

哇！虎哥真是做了一件利城利坛利民的大好事！：）这会省多少时间和精力啊！辛苦大大哒！！！

nearby2022-04-10 23:34:31

也谢谢妖妹！祝好！

妖妖灵2022-04-10 23:41:12

人脑把电脑指挥得如此强大，难怪CS火得一塌糊涂：）真心感谢虎哥，雪中送炭：）

继续阅读

美语世界2022年春季【祈祷和平】活动合辑及颁奖梅雨潭 2022-04-10 19:28:33 周末一乐：I recognize them all from Tom and Jerry欲借嵯峨 2022-04-10 17:20:45 【英译论语】学而篇（1-7）WXCTEATIME 2022-04-10 15:27:11 【春天印象散文诗艺术音乐会】【如花的四季音乐】真人秀。《樱花树下: 乌克兰世界和平歌曲》只有爱，没有战争chuntianle 2022-04-10 04:50:37 LADY CHATTERLEY'S LOVER -32唐古 2022-04-09 22:02:02 【英译唐诗和咏唱】白居易《忆江南三首》康赛欧 2022-04-09 17:50:23 【英译论语】学而篇-6WXCTEATIME 2022-04-09 14:45:26 【英译论语】学而篇-5WXCTEATIME 2022-04-08 15:26:38 【一句话翻译】汉语十级考试题移花接木 2022-04-08 12:34:30 【汉译英】《缓缓地爱到深处》by 康赛欧康赛欧 2022-04-07 19:25:13

同作者

微凉的忧伤nearby 2022-10-23 15:15:04 收割人类 III 之第十四章引力纠缠(6)nearby 2022-10-19 17:35:48 ２０２２秋色nearby 2022-10-19 01:24:38 差点出身流氓nearby 2022-10-18 18:23:25 收割人类 III 之第十四章引力纠缠(5)nearby 2022-10-18 17:07:52 【家族往事】美丽的二表姐nearby 2022-10-17 23:33:14 收割人类 III 之第十四章引力纠缠(4)nearby 2022-10-17 14:30:55 收割人类 III 之第十四章引力纠缠(3)nearby 2022-10-15 15:26:12 收割人类 III 之第十四章引力纠缠(2)nearby 2022-10-13 17:22:19 收割人类 III 之第十四章引力纠缠(１)nearby 2022-10-11 16:05:28

如何快捷的收集活动帖子做成汇总？做版主的请进。 nearby2022-04-10 23:24:23