1
0
Fork 0
mirror of https://github.com/iiab/iiab.git synced 2025-02-13 11:42:08 +00:00

Merge pull request #1617 from mitra42/mitra

Mitra - 'internetarchive' & 'yarn' Ansible playbooks for IIAB
This commit is contained in:
A Holt 2019-05-25 11:26:05 -04:00 committed by GitHub
commit a3d7791f2a
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
14 changed files with 440 additions and 0 deletions

View file

@ -21,6 +21,12 @@
when: minetest_install | bool when: minetest_install | bool
tags: minetest tags: minetest
- name: INTERNETARCHIVE
include_role:
name: internetarchive
when: internetarchive_install | bool
tags: internetarchive
- name: Recording STAGE 9 HAS COMPLETED ==================== - name: Recording STAGE 9 HAS COMPLETED ====================
lineinfile: lineinfile:
dest: "{{ iiab_env_file }}" dest: "{{ iiab_env_file }}"

View file

@ -0,0 +1,223 @@
# Internet Archive Universal Library / Decentralized Web README
The Internet Archive (http://archive.org) is famous for their WayBack Machine
that has saved 362+ Billion web pages, and more recently their Decentralized
Web project.
This Ansible role installs the Internet Archive's dweb-mirror project on
Internet-in-a-Box (IIAB). Use this to build up a dynamic offline library
arising from the materials you can explore at http://dweb.archive.org
The project is a local server that allows users to browse resources from the
Internet Archive stored on local drives - including USB drives.
It includes a crawler that can regularly synchronize local collections, against
a list of Internet Archive items and collections, and those collections can be
moved between installations.
When connected to the internet, the server works as a Proxy, i.e. it will store
Internet Archive (IA) content the user views for later off-line viewing.
There are components to integrate the IA server with decentralized tools
including IPFS, WebTorrent, GUN, WOLK, both for fetching content and for
serving it back to the net or locally.
This is an ongoing project, continually adding support for new Internet Archive
content types; new platforms; and new decentralized transports.
## Using it
### Starting server
The server is started and restarted automatically. It can be turned on or off
at a terminal window with `service internetarchive start` or `service
internetarchive stop`
### Browsing
The server can be accessed at [http://box:4244](http://box:4244) or
[http://box.lan:4244](http://box.lan:4244) (try
[http://box.local:4244](http://box.local:4244) via mDNS over a local network,
if you don't have name resolution set up to reach your Internet-in-a-Box).
_If future, we also hope to get [http://box/archive](http://box/archive) and
[http://box.lan/archive](http://box.lan/archive) working (as of 2019-05-25 the
error "Cannot GET /archive" appears — if you can help us fix
[/etc/apache2/sites-available/internetarchive.conf](https://github.com/iiab/iiab/blob/master/roles/internetarchive/templates/internetarchive.conf)
that would be incredible!)_
If you dont get an Archive UI then look at the server log (in browser console)
to see for any “FAILING” log lines which indicate a problem.
Expect to see errors in the Browser log for
`http://localhost:5001/api/v0/version?stream-channels=true` which is checking
for a local IPFS server which is not started here.
Expect, on slower machines or slower network connections, to see no images the
first time, refresh after a little while and most should appear.
## Administration
Administration is carried out through the same User Interface as browsing.
Access [http://localhost:4244/local](http://localhost:4244/local) to see a
display of local content, this interface is under development and various admin
tools will be added here. *At some point this will become the default page*.
Access [http://localhost:4244](http://localhost:4244) to get the Internet
Archive main interface if connected to the net.
While viewing an item or collection, the "Crawl" button in the top bar
indicates whether the item is being crawled or not. Clicking it will cycle
through three levels:
* No crawling
* Details - sufficient information will be crawled to display the page, for a
collection this also means getting the thumbnails and metadata for the top
items.
* Full - crawls everything on the item, this can be a LOT of data, including
full size videos etc, so use with care if bandwidth/disk is limited.
### Disks
The server checks for caches of content in directories called `archiveorg` in
all the likely places, in particular it looks in `/media/pi/*archiveorg` for
any inserted USB drives, and if none are found, it uses `/library/archiveorg`.
The list of places it checks, in an unmodified installation can be seen at
`https://github.com/internetarchive/dweb-mirror/blob/master/configDefaults.yaml#L7`.
You can override this in `dweb-mirror.config.yaml` in the home directory of the
user that runs the server, this is currently `/root/dweb-mirror.config.yaml`
(see 'Advanced' below)
Archive's `Items` are stored in subdirectories of the first of these
directories found, but are read from any of the locations.
If you disk space is getting full, its perfectly safe to delete any
subdirectories, or to move them to an attached USB. Its also safe to move
attached USB's from one device to another.
The one directory you should not move or delete is `archiveorg/.hashstore` in
any of these locations, the server will refetch anything else it needs if you
browse to the item again when connected to the internet.
### Maintenance
If you are worried about corruption, or after for example hand-editing or
moving cached items around.
```
# Run everything as root
sudo sh
# cd into location for your installation
cd /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-mirror
./internetarchive -m
```
This will usually take about 5-10 minutes depending on the amount of material
cached, just to rebuild a table of checksums.
### Advanced
Most functionality of the tool is controlled by two YAML files, the second of
which you can edit if you have access to the shell.
You can view the current configuration by going to
[http://box.lan:4244/info](http://box.lan:4244/info) or
[http://localhost:4244/info](http://localhost:4244/info) depending on how you
are connected.
The default, and user configurations are displayed as the `0` and `1` item in
the `/info` call.
In the Repo is a
[default YAML file](https://github.com/internetarchive/dweb-mirror/blob/master/configDefaults.yaml)
which is commented. It would be a bad idea to edit this, so I'm not going to
tell you where it is on your installation! But anything from this file can be
overridden by lines in `/root/dweb-mirror.config.yaml`. Make sure you
understand how yaml works before editing this file, if you break it, you can
copy a new default from
[dweb-mirror.config.yaml on the repo](https://github.com/internetarchive/dweb-mirror/blob/master/dweb-mirror.config.yaml)
TODO Note this file will probably move location.
Note that this file is also edited automatically when the Crawl button
described above is clicked.
As the project develops, this file will be editable via a UI.
## Update
Dweb-mirror is under rapid development, as is the JavaScript UI. It's
recommended to update frequently.
From a Terminal window
```
sudo sh # Run all commands as root
cd /opt/iiab/internetarchive
yarn upgrade # Currently this can take up to about 20 minutes to run, we hope to reduce that time
```
## Crawling
The Crawler will be built into the UI fairly soon, for now it has to be run in
a terminal window.
Its highly configurable either through the YAML file described above, or from
the command line.
In a shell
```
# Run all commands as root from dweb-mirror's directory
sudo sh
# cd into location for your installation
cd /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-mirror
# To get a full list of possible arguments
./internetarchive --help
# Perform a standard crawl
./internetarchive --crawl
# To fetch the "foobar" item from IA.
./internetarchive --crawl foobar
# To crawl top 10 items in the prelinger collection sufficiently to display and put
# them on a disk plugged into the /media/pi/xyz
# TODO check where pi actually put them.
./internetarchive --copydirectory /media/pi/xyz/archiveorg --crawl --rows 10 --level details prelinger
```
## Troubleshooting
There are two logs of relevance, the browser and the server.
**Browser**: If using Chrome then this is at View / Developer Tools /
JavaScript Console or something similar.
**Server**:
From a Terminal window.
```
journalctl -u internetarchive
```
## Known Issues
See
[github dweb-mirror issues](https://github.com/internetarchive/dweb-mirror/issues);
and
[github dweb-archive issues](https://github.com/internetarchive/dweb-archive/issues);
## More info
Dweb-Mirror lives on GitHub at:
* [dweb-mirror](https://github.com/internetarchive/dweb-mirror)
* [source](https://github.com/internetarchive/dweb-mirror)
* [issues](https://github.com/internetarchive/dweb-mirror/issues)
* [API.md](./API.md) API documentation for dweb-mirror
This project is part of the Internet Archive's larger Dweb project, see also:
* [dweb-universal](https://github.com/internetarchive/dweb-universal) info about others distributing the web
* [dweb-transport](https://github.com/internetarchive/dweb-transport) miscellaneous incl GUN gateway and WebTorrent
* [dweb-objects](https://github.com/internetarchive/dweb-objects) library of dweb objects
* [dweb-archive](https://github.com/internetarchive/dweb-archive) archive UI in JavaScript
* [dweb-archivecontroller](https://github.com/internetarchive/dweb-archive) Knows about the structure of archive objects

View file

@ -0,0 +1,9 @@
# internetarchive_install: False
# internetarchive_enabled: False
# internetarchive_port: 4244
# All above are set in: github.com/iiab/iiab/blob/master/vars/default_vars.yml
# If nec, change them by editing /etc/iiab/local_vars.yml prior to installing!
internetarchive_dir: '{{ iiab_base }}/internetarchive'

View file

@ -0,0 +1,3 @@
dependencies:
- { role: nodejs, tags: ['nodejs'], when: internetarchive_install | bool }
- { role: yarn, tags: ['yarn'], when: internetarchive_install | bool }

View file

@ -0,0 +1,99 @@
# We need a recent version of node
- name: FAIL (STOP INSTALLING) IF nodejs_version is not set to 10.x
fail:
msg: "Internet Archive install cannot proceeed, as it currently requires Node.js 10.x, and your nodejs_version is set to {{ nodejs_version }}. Please check the value of nodejs_version in /opt/iiab/iiab/vars/default_vars.yml and possibly also /etc/iiab/local_vars.yml"
when: internetarchive_install and (nodejs_version != "10.x")
- name: Install packages needed by Distributed Web
package:
name:
- libsecret-1-dev
- cmake
state: present
- name: Create directory {{ internetarchive_dir }}
file:
path: "{{ internetarchive_dir }}"
state: directory
owner: "root"
- name: Run yarn install to get needed modules (CAN TAKE ~5 MINUTES)
command: sudo yarn add @internetarchive/dweb-archive @internetarchive/dweb-mirror
args:
chdir: "{{ internetarchive_dir }}"
when: internet_available | bool
- name: Create directory /library/archiveorg
file:
path: "/library/archiveorg"
state: directory
owner: "root"
# CONFIG FILES
- name: "Install from templates: internetarchive.service (systemd), internetarchive.conf (Apache)"
template:
src: "{{ item.src }}"
dest: "{{ item.dest }}"
mode: 0644
owner: root
group: root
with_items:
- { src: 'internetarchive.service.j2', dest: '/etc/systemd/system/internetarchive.service' }
- { src: 'internetarchive.conf', dest: '/etc/apache2/sites-available/internetarchive.conf' }
- name: Create symlink internetarchive.conf from sites-enabled to sites-available, for short URL http://box/archive (if debuntu and internetarchive_enabled)
file:
src: /etc/apache2/sites-available/internetarchive.conf
path: /etc/apache2/sites-enabled/internetarchive.conf
state: link
when: is_debuntu and internetarchive_enabled
- name: Remove symlink /etc/apache2/sites-enabled/internetarchive.conf (if debuntu and not internetarchive_enabled)
file:
path: /etc/apache2/sites-enabled/internetarchive.conf
state: absent
when: is_debuntu and not internetarchive_enabled
# RESTART/STOP SYSTEMD SERVICE
# with "systemctl daemon-reload" in case mongodb.service changed, etc
- name: Enable & Restart 'internetarchive' systemd service (if internetarchive_enabled)
systemd:
name: internetarchive
daemon_reload: yes
enabled: yes
state: restarted
when: internetarchive_enabled | bool
- name: Disable & Stop 'internetarchive' systemd service (if not internetarchive_enabled)
systemd:
name: internetarchive
daemon_reload: yes
enabled: no
state: stopped
when: not internetarchive_enabled
- name: Restart Apache service ({{ apache_service }}) to enable/disable http://box/archive (not just http://box:{{ internetarchive_port }})
systemd:
name: "{{ apache_service }}" # httpd or apache2
state: restarted
when: internetarchive_enabled | bool
- name: Add 'internetarchive' variable values to {{ iiab_ini_file }}
ini_file:
path: "{{ iiab_ini_file }}"
section: internetarchive
option: "{{ item.option }}"
value: "{{ item.value }}"
with_items:
- option: name
value: Internet Archive Distributed Web
- option: description
value: '"Dweb-mirror is intended to make the Internet Archive experience and UI available offline."'
- option: internetarchive_enabled
value: "{{ internetarchive_enabled }}"

View file

@ -0,0 +1,8 @@
# internetarchive_port is set to 4244 in roles/internetarchive/defaults/main.yml
# If you need to change this, edit /etc/iiab/local_vars.yml prior to installing
RedirectMatch ^/archive.org$ /archive
RedirectMatch ^/internetarchive$ /archive
ProxyPass /archive http://localhost:{{ internetarchive_port }}/archive
ProxyPassReverse /archive http://localhost:{{ internetarchive_port }}/archive

View file

@ -0,0 +1,17 @@
[Unit]
Description=Internet Archive Universal Library service
After=network-online.target
[Service]
Type=simple
WorkingDirectory={{ internetarchive_dir }}/node_modules/@internetarchive/dweb-mirror
ExecStart=/usr/bin/node ./internetarchive -s
Restart=always
RestartSec=10
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=internetarchive
[Install]
WantedBy=multi-user.target

View file

@ -55,6 +55,7 @@ block_DNS={{ block_DNS }}
calibre_port={{ calibre_port }} calibre_port={{ calibre_port }}
calibreweb_port={{ calibreweb_port }} calibreweb_port={{ calibreweb_port }}
cups_port={{ cups_port }} cups_port={{ cups_port }}
internetarchive_port={{ internetarchive_port }}
kalite_server_port={{ kalite_server_port }} kalite_server_port={{ kalite_server_port }}
kiwix_port={{ kiwix_port }} kiwix_port={{ kiwix_port }}
kolibri_http_port={{ kolibri_http_port }} kolibri_http_port={{ kolibri_http_port }}
@ -142,6 +143,7 @@ if [ "$wan" != "none" ]; then
$IPTABLES -A INPUT -p tcp --dport $calibre_port -m state --state NEW -i $wan -j ACCEPT $IPTABLES -A INPUT -p tcp --dport $calibre_port -m state --state NEW -i $wan -j ACCEPT
$IPTABLES -A INPUT -p tcp --dport $calibreweb_port -m state --state NEW -i $wan -j ACCEPT $IPTABLES -A INPUT -p tcp --dport $calibreweb_port -m state --state NEW -i $wan -j ACCEPT
$IPTABLES -A INPUT -p tcp --dport $cups_port -m state --state NEW -i $wan -j ACCEPT $IPTABLES -A INPUT -p tcp --dport $cups_port -m state --state NEW -i $wan -j ACCEPT
$IPTABLES -A INPUT -p tcp --dport $internetarchive_port -m state --state NEW -i $wan -j ACCEPT
$IPTABLES -A INPUT -p tcp --dport $kalite_server_port -m state --state NEW -i $wan -j ACCEPT $IPTABLES -A INPUT -p tcp --dport $kalite_server_port -m state --state NEW -i $wan -j ACCEPT
$IPTABLES -A INPUT -p tcp --dport $kiwix_port -m state --state NEW -i $wan -j ACCEPT $IPTABLES -A INPUT -p tcp --dport $kiwix_port -m state --state NEW -i $wan -j ACCEPT
$IPTABLES -A INPUT -p tcp --dport $kolibri_http_port -m state --state NEW -i $wan -j ACCEPT $IPTABLES -A INPUT -p tcp --dport $kolibri_http_port -m state --state NEW -i $wan -j ACCEPT

24
roles/yarn/README.rst Normal file
View file

@ -0,0 +1,24 @@
.. |ss| raw:: html
<strike>
.. |se| raw:: html
</strike>
.. |nbsp| unicode:: 0xA0
:trim:
==================
yarn README
==================
Yarn is an alternative to npm that is becoming more widely used though there is
still intense npm v. yarn debate.
It's used for the internetarchive role partly because its faster and with MUCH
less confusing error messages, partly because it does a better job of
deduplicating nested modules - reducing disk and bandwidth usage but more
importantly because the resulting node_modules is deterministic, so we can
reach down and link to inner modules (dweb-archive and dweb-transports in
particular) with certainty about where they will be.

28
roles/yarn/tasks/main.yml Normal file
View file

@ -0,0 +1,28 @@
- name: "Yarn | GPG"
apt_key:
url: https://dl.yarnpkg.com/debian/pubkey.gpg
state: present
- name: "Yarn | Ensure Debian sources list file exists"
file:
path: /etc/apt/sources.list.d/yarn.list
owner: root
mode: 0644
state: touch
- name: "Yarn | Ensure Debian package is in sources list"
lineinfile:
dest: /etc/apt/sources.list.d/yarn.list
regexp: 'deb http://dl.yarnpkg.com/debian/ stable main'
line: 'deb http://dl.yarnpkg.com/debian/ stable main'
state: present
- name: "Yarn | Update APT cache"
apt:
update_cache: yes
- name: "Yarn | Install"
package:
name: yarn
state: latest
when: internet_available and is_debuntu

View file

@ -473,6 +473,12 @@ calibreweb_port: 8083 # PORT VARIABLE HAS NO EFFECT (as of January 2019)
calibreweb_url: /books calibreweb_url: /books
calibreweb_home: "{{ content_base }}/calibre-web" # /library/calibre-web calibreweb_home: "{{ content_base }}/calibre-web" # /library/calibre-web
# Internet Archive Decentralized Web - create your own offline version box:4244
# (or http://box/archive) arising from digital library https://dweb.archive.org
internetarchive_install: False
internetarchive_enabled: False
internetarchive_port: 4244 # for http://box:4244
# Minetest is an open source clone of the Minecraft building blocks game # Minetest is an open source clone of the Minecraft building blocks game
minetest_install: False minetest_install: False
minetest_enabled: False minetest_enabled: False

View file

@ -313,6 +313,11 @@ calibreweb_port: 8083 # PORT VARIABLE HAS NO EFFECT (as of January 2019)
calibreweb_url: /books calibreweb_url: /books
calibreweb_home: "{{ content_base }}/calibre-web" # /library/calibre-web calibreweb_home: "{{ content_base }}/calibre-web" # /library/calibre-web
# Internet Archive Decentralized Web - create your own offline version box:4244
# (or http://box/archive) arising from digital library https://dweb.archive.org
internetarchive_install: True
internetarchive_enabled: True
# Minetest is an open source clone of the Minecraft building blocks game # Minetest is an open source clone of the Minecraft building blocks game
minetest_install: True minetest_install: True
minetest_enabled: True minetest_enabled: True

View file

@ -313,6 +313,11 @@ calibreweb_port: 8083 # PORT VARIABLE HAS NO EFFECT (as of January 2019)
calibreweb_url: /books calibreweb_url: /books
calibreweb_home: "{{ content_base }}/calibre-web" # /library/calibre-web calibreweb_home: "{{ content_base }}/calibre-web" # /library/calibre-web
# Internet Archive Decentralized Web - create your own offline version box:4244
# (or http://box/archive) arising from digital library https://dweb.archive.org
internetarchive_install: False
internetarchive_enabled: False
# Minetest is an open source clone of the Minecraft building blocks game # Minetest is an open source clone of the Minecraft building blocks game
minetest_install: False minetest_install: False
minetest_enabled: False minetest_enabled: False

View file

@ -313,6 +313,11 @@ calibreweb_port: 8083 # PORT VARIABLE HAS NO EFFECT (as of January 2019)
calibreweb_url: /books calibreweb_url: /books
calibreweb_home: "{{ content_base }}/calibre-web" # /library/calibre-web calibreweb_home: "{{ content_base }}/calibre-web" # /library/calibre-web
# Internet Archive Decentralized Web - create your own offline version box:4244
# (or http://box/archive) arising from digital library https://dweb.archive.org
internetarchive_install: False
internetarchive_enabled: False
# Minetest is an open source clone of the Minecraft building blocks game # Minetest is an open source clone of the Minecraft building blocks game
minetest_install: False minetest_install: False
minetest_enabled: False minetest_enabled: False