1
0
Fork 0
mirror of https://github.com/iiab/iiab.git synced 2025-03-09 15:40:17 +00:00

Minor cleanup

This commit is contained in:
Mitra Ardron 2019-10-16 17:03:38 +11:00
parent 7ea25a3a63
commit e512cea92a

View file

@ -1,10 +1,10 @@
# Offline Internet Archive
# Offline Internet Archive README
The Internet Archive offers perhaps the worlds largest online store of open content.
The wisdom of the ages, just a few clicks away. As Wikipedia has become the worlds encyclopedia,
the Internet Archive has become its library.
Central to our mission is establishing “Universal Access to All Knowledge”.
Access to our library of millions of books, journals, audio and video recordings and beyond is free to anyone
Access to our library of millions of books, journals, audio and video recordings and beyond is free to anyone.
This Ansible role installs the Internet Archive's dweb-mirror project on
Internet-in-a-Box (IIAB). Use this to build up a dynamic offline library
@ -12,19 +12,18 @@ arising from the materials you can explore at http://dweb.archive.org
The Offline Internet Archive server:
* Crawls Internet Archive collections to a local server.
* Crawls Internet Archive collections to a local server,
* Serves that content locally,
* Caches content while browsing.
* Moves content between servers by sneakernet — on disks, USB sticks, and SD cards.
* Caches content while browsing,
* Moves content between servers by sneakernet — on disks, USB sticks, and SD cards,
* Delivers (mostly) the Internet Archive UI offline in javascript in the browser,
* Is open source
* Is open source,
* And is being made available in other languages.
## Starting server
The server is started and restarted automatically. It can be turned on or off
at a terminal window with `service internetarchive start` or `service
internetarchive stop`
at a terminal window with `service internetarchive start` or `service internetarchive stop`
## Browsing
@ -34,7 +33,7 @@ Open the web page at [http://box:4244](http://box:4244) or
if you don't have name resolution set up to reach your Internet-in-a-Box).
There are several aspects to managing content on the Internet Archives Universal Library which are covered below,
these include crawling content to your own system , or to an external drive suitable for moving to another system,
these include crawling content to your own system, or to an external drive suitable for moving to another system,
and managing a collection of material on the archive that others can download automatically.
Try walking through the following steps to get a tour of the system and understand more about:
@ -47,7 +46,7 @@ Try walking through the following steps to get a tour of the system and understa
* Downloading content for a different box
* Managing collections on Internet Archive
or you can click `Home` or the Internet Archive logo,
Or you can click `Home` or the Internet Archive logo,
if you just want to explore the Internet Archive's resources.
## Using the page
@ -64,17 +63,16 @@ while online, and don't appear when offline.
Below that is a row of information specific to the offline application.
First are health indicators.
* If it shows "Mirror" in Red, it means we can't communicate with the mirror gateway,
this will only happen if the gateway goes offline part way through a process.
this will only happen if the gateway goes offline part way through a process.
* Normally you'll see an indicator for GATEWAY, which is Green when the gateway can talk to the Archive,
and Red when you are offline.
* Then comes an indicator for this page, whether it is being crawled, and if so approximately how much has been stored.
* If the mirror is online to the Internet Archive (GATEWAY shows Green) then next comes a "Reload" button,
you can click this to force it to check with the Archive for an up to date list.
It is most useful on collections when someone else might have added something,
but your gateway might be remembering an old version.
* If the mirror is online to the Internet Archive (GATEWAY shows Green), then next comes a "Reload" button,
you can click this to force it to check with the Archive for an up to date list.
It is most useful on collections when someone else might have added something,
but your gateway might be remembering an old version.
* Then there is a Settings button which brings up a page that includes status of any crawls.
* Finally there is a Home button which will bring you back to this page.
@ -82,10 +80,11 @@ Each tile on this page represents an item that your server will check for when i
The first time you access the server this will depend on what was installed on the server, and it might be empty.
Notice that most of the tiles should have a White, Green or Blue dot in the top right to indicate that you are crawling them.
* A White dot means the item has been downloaded and enough of it has been downloaded to be viewed offline.
* The Green dot indicates that we are checking this item each time we crawl and getting enough to display offline.
* A Blue dot indicates we are crawling all the content of the item, this could be a lot of data,
for example a full resolution version of the video. Its rare that youll use this.
for example a full resolution version of the video. Its rare that youll use this.
This button also shows how much has been downloaded, for an item its the total size of downloaded files/pages,
for a collection its the total amount in all collection members.
@ -104,6 +103,7 @@ and a total of 400Mb is downloaded in this collection. (Which includes some file
If you click on an item that is already downloaded (Blue, Green or White dot) then it will be displayed offline,
the behavior depends on the kind of item.
* Images are displayed and saved for offline use
* Books display in a flip book format, pages you look at will be saved for offline use.
* Video and Audio will play immediately and you can skip around in them as normal
@ -126,6 +126,7 @@ These should include any inserted drive with "archiveorg" as a directory at its
The content will be copied to that drive, which can then be removed and inserted into a different server.
The server checks whether these disks are present every 15 seconds, so to use a new USB disk:
* Insert the USB
* Create a folder at its top level called `archiveorg`
* Wait about 15 seconds
@ -152,7 +153,7 @@ This page is still under development (as of June 2019).
On here you will see a list of crawls.
You should get useful information about status, any errors etc.
Hitting `<<` will restart the crawl and `||` or `>' pause and resume,
Hitting `<<` will restart the crawl and `||` or `>` pause and resume,
but note that any file already being downloaded will continue to do so when you hit pause.
Hitting `||` `<<` `<` will stop the current crawl, reset and retry, which is a good way to try again if,
for example, you lost connection to the server part way through.
@ -167,7 +168,7 @@ In a shell
```
sudo sh
```
cd into the location for your installation, on most platforms it is:
cd into the location for your installation
```
cd /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-mirror
```
@ -175,7 +176,7 @@ Perform a standard crawl
```
./internetarchive --crawl
```
To fetch the "foobar" item from IA.
To fetch the "foobar" item from IA
```
./internetarchive --crawl foobar
```
@ -194,11 +195,11 @@ To get a full list of possible arguments and some more examples
If you have access to the command line on the server, then there is a lot more you can do with the crawler.
The items selected for crawling (Green or Blue dots) are stored in a file `dweb-mirror.config.yaml`
in the one directory of the server, e.g. on IIAB its in /root/dweb-mirror.config.yaml
and on your laptop its probably in ~/dweb-mirror.config.yaml.
You can edit this file with care !
in the one directory of the server, e.g. on IIAB its in `/root/dweb-mirror.config.yaml`
and on your laptop its probably in `~/dweb-mirror.config.yaml`.
You can edit this file with care!
From the command line, cd into the directory holding the service to run the crawler e.g. on iIAB
From the command line, cd into the installation
```
cd /opt/iiab/internetarchive/node_modules/dweb-mirror
./internetarchive --crawl
@ -220,17 +221,20 @@ To put content onto a device, you can either:
* hit `Save` while on an item or search
* or run a crawl at the command line
cd into your device e.g. on an IIAB it would be
```
# CD into your device e.g. on an IIAB it would be
cd /media/pi/foo
# Create a directory to use for the content, it must be called "archiveorg"
```
Create a directory to use for the content, it must be called "archiveorg"
```
mkdir archiveorg
# CD to the installation
```
cd to the installation
```
cd /opt/iiab/internetarchive/node_modules/dweb-mirror
# Copy the current crawl to the directory
```
Copy the current crawl to the directory
```
./internetarchive --crawl --copydirectory /media/foo/archiveorg
```
When its finished, you can unplug the USB drive and plug into any other device
@ -247,11 +251,11 @@ and just checks the content is up to date.
You can create and manage your own collections on the [Internet Archive site](http://www.archive.org).
Other people can then crawl those collections.
First get in touch with Mitra Ardron at mitra@archive.org , as processes may have changed since this is written.
First get in touch with Mitra Ardron at `mitra@archive.org`, as processes may have changed since this is written.
You'll need to create an account for yourself at [archive.org](https://archive.org)
We'll setup a collection for you of type "texts" - dont worry, you can put any kind of media in it.
We'll setup a collection for you of type `texts` - dont worry, you can put any kind of media in it.
Once you have a collection, lets say `kenyanhistory`
you can upload materials to the Archive by hitting the Upload button and following the instructions.
@ -260,15 +264,15 @@ You can also add any existing material on the Internet Archive to this collectio
* Find the material you are looking for
* You should see a URL like `https://archive.org/details/foobar`
* Copy the identifier which in this case would be 'foobar'
* Copy the identifier which in this case would be `foobar`
* Go to `https://archive.org/services/simple-lists-admin/?identifier=kenyanhistory&list_name=items`
replacing `kenyanhistory` with the name of your collection.
* Enter the name of the item `foobar` into the box and click "Add".
* Enter the name of the item `foobar` into the box and click `Add`.
* It might take a few minutes to show up, you can add other items while you wait.
* The details page for the collection should then show your new item `https://archive.org/details/kenyanhistory`
On the device, you can go to `kenyanhistory` and should see `foobar`.
Hit Refresh and `foobar` should show up.
Hit `Refresh` and `foobar` should show up.
If `kenyanhistory` is marked for crawling it should update automatically
## Administration
@ -281,7 +285,7 @@ Administration tools are under `Settings`.
Click on the Archive logo, in the center-top, to get the Internet
Archive main interface if connected to the net.
While viewing an item or collection, the "Crawl" button in the top bar
While viewing an item or collection, the `Crawl` button in the top bar
indicates whether the item is being crawled or not. Clicking it will cycle
through three levels:
@ -310,7 +314,7 @@ directories found, but are read from any of the locations.
If you disk space is getting full, its perfectly safe to delete any
subdirectories (except `archiveorg/.hashstore`), and the server will refetch anything else it needs
next time youbrowse to the item while connected to the internet.
next time you browse to the item while connected to the internet.
Its also safe to move directories to an attached USB
(underneath a `archiveorg` directory at the top level of the disk)
It is also safe to move attached USB's from one device to another.
@ -322,10 +326,13 @@ but most of it works now.
If you are worried about corruption, or after for example hand-editing or
moving cached items around.
Run everything as root
```
# Run everything as root
sudo su
# cd into location for your installation
```
cd into location for your installation
```
cd /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-mirror
./internetarchive -m
```
@ -343,9 +350,9 @@ the `/info` call.
In the Repo is a
[default YAML file](https://github.com/internetarchive/dweb-mirror/blob/master/configDefaults.yaml)
which is commented. It would be a bad idea to edit this, so I'm not going to
tell you where it is on your installation! But anything from this file can be
overridden by lines in `/root/dweb-mirror.config.yaml`. Make sure you
which is commented. It would be a bad idea to edit this, so I'm not going to
tell you where it is on your installation! But anything from this file can be
overridden by lines in `/root/dweb-mirror.config.yaml`. Make sure you
understand how yaml works before editing this file, if you break it, you can
copy a new default from
[dweb-mirror.config.yaml on the repo](https://github.com/internetarchive/dweb-mirror/blob/master/dweb-mirror.config.yaml)