112
6

Pro-Russian Narratives Target Wikipedia, Marking a Dangerous Trend for AI Chatbot Data

4mon 16d ago by lemmy.world/u/wikipediasuckscoop in technology from united24media.com

Annnnnd that's why I downloaded a snapshot of Wikipedia a few months ago and host it locally.

Sad that it's necessary, but with modern AI tooling, we have everything we need to destroy knowledge on an industrial scale.

How do you selfhost Wikipedia? Any good guides in how to do it?

Wikipedia has guides for it; Check the downloading wikipedia section. The most popular offline client atm is Kiwix reader

Kiwix is the easiest way to do it; if you have Docker/Kubernetes, there's a Docker image at ghcr.io/kiwix/kiwix-serve, and the K8s manifest to deploy is as simple as:

apiVersion: v1
kind: Service
metadata:
  name: wikipedia-service
spec:
  selector:
    app: kiwix-server
  ports:
  - port: 80
    targetPort: 8080
  clusterIP: None
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wikipedia-server
  labels:
    app: kiwix-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kiwix-server
  template:
    metadata:
      name: wikipedia-server
      labels:
        app: kiwix-server
    spec:
      containers:
      - name: kiwix-server
        image: kiwix/kiwix-serve:3.8.0
        imagePullPolicy: IfNotPresent
        command:
        - /usr/local/bin/kiwix-serve
        - --port=8080
        - --verbose
        - /data/wikipedia_en_all_maxi.zim
        ports:
        - containerPort: 8080
          protocol: TCP
        volumeMounts:
        - name: data
          mountPath: /data
          readOnly: true
          limits:
            memory: "128Mi"
            cpu: "2000m"
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: wikipedia-mirror

Then you just need to download a copy of the mirror file wikipedia_en_all_maxi.zim and put it in the appropriate place - wget https://download.kiwix.org/zim/wikipedia_en_all_maxi.zim

As soon as you leave English language wikipedia this happens fairly often. Not necessarily Russian, maybe adjacent. And not since yesterday! I noticed around Corona, and it's been a problem for way longer. It's relatively easy for 1 editor to slip through unnoticed if there isn't enough eyeballs on the article, and hey they can write what their overlords tell them unchallenged.

"that some believe may be part of a Russian influence operation"

Jeebus a bunch of drivel propaganda.. The whole site is the same kind of nonsense fantasy. They just jump along on the dumdum official propaganda narrative about the baad baad Russia.. Incredible that people still haven't figured out that they were conned - played - by US/western elites.

Oh well, fortunately Russia will survive Nato's proxy war against them, and they'll survive primitive propaganda as this 'article'.