统计网友活动的python程序。邻兄拒不跟帖，放在这里当做存根，须用的请自用 nearby 文学城

nearby2022-08-28 19:05:42

# Author: 书香之家版主 nearby, August 2022
#
# This program allows you to analyze the activities of all the users in a WXC 论坛, for example, 书香之家(sxsj).
# It counts the numbers of 主帖 and 跟帖 respectively for each user.
# The result is printed into a .CSV file. Note, to view the Chinese characters, CSV file is not good.
# So, you can view the result using Notepad or other text editor and then copy/paste the result into an Excel file.
#
#

import requests

# users: a dictionary. key=username, value = list. Inside the list, the first element is the number of 主帖
# the second element is the number of 跟帖
def processOneFile(us_dict, html):
    all = html.text.split('\n')
    length = len(all)
    i = 0
    while i < length:
        line = all[i].strip()
        jump = 6
        if line == '<!--   -->':
            i = i + 1
            line = all[i].strip()
            if line == '<!-- 列表中插广告 -->':
                jump = 9
            i = i + jump
            # print(all[i].strip())
            # this is a 主帖. get the user name first
            i = i + 3
            # the line looks like: <a class="b" href="https://passport.wenxuecity.com/members/index.php?act=profile&amp;cid=ling_yin_shi">ling_yin_shi</a>
            user = all[i].strip().split('>')[1].split('<')[0]
            # add one for this user on his or her 主帖
      if user in us_dict:
                L = us_dict[user]
                L[0] = L[0] + 1
            else:
                L = [1,0]
                us_dict[user] = L
            # Now, process on the 跟帖
            i = i + 1
            line = all[i].strip()
            while line != '</div>':
                # target this line: <a class="b"  href="https://passport.wenxuecity.com/members/index.php?act=profile&amp;cid=FionaRawson">FionaRawson</a> -
                if line.startswith('<a class="b"  href='):
                    sub_user = line.split('>')[1].split('<')[0]
                    # add one for this user on his/her 跟帖. Here, the guanshui variable is used.
                    if sub_user != user or guanshui == False:
                        if sub_user in us_dict:
                            L = us_dict[sub_user]
                            L[1] = L[1] + 1
                        else:
                            L = [0, 1]
                            us_dict[sub_user] = L
                i = i + 1
                line = all[i].strip()

        i = i + 1



# ---- main starts here ----

print()
print('# Author: 书香之家版主 nearby, August 2022')
print()

subid = 'sxsj'
temp = input('What is the name of your 论坛 in English? For example, 书香之家 is sxsj, 美语世界 is mysj, 文化走廊 is culture, 诗词欣赏 is poetry: ')
if len(temp) >= 2:
    subid = temp

numPages = 200
temp = input('How many pages you would like to search? If do not know, just hit ENTER, the program will search for 200 pages by default. ')
if len(temp) >= 1:
    numPages = int(temp)

guanshui = False # Use this variable because of kirn's talking about 灌水 :-)
temp = input('Discard those 跟帖 that a user made after his/her own post? (1=yes, 0=no, default=0)\n' +
             'Sometimes a user only post 跟帖 after his/her own 主帖. If yes, then such 跟帖 will be discarded.  ')
if int(temp) > 0:
    guanshui = True

print('guanshui='+str(guanshui))

users = dict()
for i in range(1, numPages+1):
    url = 'https://bbs.wenxuecity.com/' + subid + '/?page=' + str(i)
    f = requests.get(url)
    processOneFile(users, f)

print("\n---------------\n")
ks = users.keys()
html2 = open('sxzj-out.csv', 'w', encoding='utf-8')
for u in ks:
    L = users[u]
    print(u + ',' + str(L[0]) + ',' + str(L[1]))
    html2.write(u + ',' + str(L[0]) + ',' + str(L[1]) + '\n')
html2.close()
print("\n")
print("\n")
print("Please check the file sxzj-out.csv. The result is in it! Thanks for using this program. ---- 虎哥 / Nearby / 邻兄")

尘凡无忧2022-08-28 19:09:13

盲赞。邻兄太nice。：）

kirn2022-08-28 19:41:41

你真黑！

FionaRawson2022-08-28 20:19:12

只能佩服了。。。。借这里和无忧说一下，无忧之前提过延长新冠活动一个星期，我想了想，

尘凡无忧2022-08-28 20:26:59

啊，我刚才在上面说都没看到你这个。。。心有灵犀握握手。：）

lovecat082022-08-28 20:27:35

服，黑，了！

尘凡无忧2022-08-28 20:41:34

活动延长到9月10号。我知道高妹还有很多想说的。。。不过你看自己的时间安排。：）

FionaRawson2022-08-28 20:51:39

谢谢

妖妖灵2022-08-28 22:03:45

虎哥，活雷锋英文怎么翻？：）

望沙2022-08-28 22:33:55

ling_yin_shi2022-08-29 00:37:31

这是功力。也是爱呀，：）

继续阅读

【新冠时代】新冠核酸疫苗和自身免疫性疾病的关系杨别青 2022-08-28 18:45:39 网友们的贡献统计nearby 2022-08-28 18:34:52 1961年对美国的赌咒，100年后 i.e., 2061年会实现吗？西东人8 2022-08-28 18:29:57 【新冠时代】关于伊维菌素预防或治疗2019新冠肺炎的几个事实杨别青 2022-08-28 16:19:33 【新冠时代】新冠时代的心路历程！laopika 2022-08-28 16:14:18 今世前生之《东邪西毒》：缘浅情深，醉人麻心ling_yin_shi 2022-08-28 15:49:45 《走在美国》系列三----三藩市旧金山kirn 2022-08-28 14:33:24 Happy Sunday !CBA7 2022-08-28 12:50:15 让人疼得揪心的孩子大龙梧桐之丘 2022-08-28 12:26:44 满堂英杰：李政道的基督教家族伏灵社 2022-08-28 04:31:17

同作者

微凉的忧伤nearby 2022-10-23 15:15:04 收割人类 III 之第十四章引力纠缠(6)nearby 2022-10-19 17:35:48 ２０２２秋色nearby 2022-10-19 01:24:38 差点出身流氓nearby 2022-10-18 18:23:25 收割人类 III 之第十四章引力纠缠(5)nearby 2022-10-18 17:07:52 【家族往事】美丽的二表姐nearby 2022-10-17 23:33:14 收割人类 III 之第十四章引力纠缠(4)nearby 2022-10-17 14:30:55 收割人类 III 之第十四章引力纠缠(3)nearby 2022-10-15 15:26:12 收割人类 III 之第十四章引力纠缠(2)nearby 2022-10-13 17:22:19 收割人类 III 之第十四章引力纠缠(１)nearby 2022-10-11 16:05:28

统计网友活动的python程序。邻兄拒不跟帖，放在这里当做存根，须用的请自用 nearby2022-08-28 19:05:42