mirror of
https://github.com/iiab/iiab.git
synced 2025-03-09 15:40:17 +00:00
Minor cleanup
This commit is contained in:
parent
7ea25a3a63
commit
e512cea92a
1 changed files with 51 additions and 44 deletions
|
@ -1,10 +1,10 @@
|
|||
# Offline Internet Archive
|
||||
# Offline Internet Archive README
|
||||
|
||||
The Internet Archive offers perhaps the world’s largest online store of open content.
|
||||
The wisdom of the ages, just a few clicks away. As Wikipedia has become the world’s encyclopedia,
|
||||
the Internet Archive has become its library.
|
||||
Central to our mission is establishing “Universal Access to All Knowledge”.
|
||||
Access to our library of millions of books, journals, audio and video recordings and beyond is free to anyone
|
||||
Access to our library of millions of books, journals, audio and video recordings and beyond is free to anyone.
|
||||
|
||||
This Ansible role installs the Internet Archive's dweb-mirror project on
|
||||
Internet-in-a-Box (IIAB). Use this to build up a dynamic offline library
|
||||
|
@ -12,19 +12,18 @@ arising from the materials you can explore at http://dweb.archive.org
|
|||
|
||||
The Offline Internet Archive server:
|
||||
|
||||
* Crawls Internet Archive collections to a local server.
|
||||
* Crawls Internet Archive collections to a local server,
|
||||
* Serves that content locally,
|
||||
* Caches content while browsing.
|
||||
* Moves content between servers by sneakernet — on disks, USB sticks, and SD cards.
|
||||
* Caches content while browsing,
|
||||
* Moves content between servers by sneakernet — on disks, USB sticks, and SD cards,
|
||||
* Delivers (mostly) the Internet Archive UI offline in javascript in the browser,
|
||||
* Is open source
|
||||
* Is open source,
|
||||
* And is being made available in other languages.
|
||||
|
||||
## Starting server
|
||||
|
||||
The server is started and restarted automatically. It can be turned on or off
|
||||
at a terminal window with `service internetarchive start` or `service
|
||||
internetarchive stop`
|
||||
at a terminal window with `service internetarchive start` or `service internetarchive stop`
|
||||
|
||||
## Browsing
|
||||
|
||||
|
@ -34,7 +33,7 @@ Open the web page at [http://box:4244](http://box:4244) or
|
|||
if you don't have name resolution set up to reach your Internet-in-a-Box).
|
||||
|
||||
There are several aspects to managing content on the Internet Archive’s Universal Library which are covered below,
|
||||
these include crawling content to your own system , or to an external drive suitable for moving to another system,
|
||||
these include crawling content to your own system, or to an external drive suitable for moving to another system,
|
||||
and managing a collection of material on the archive that others can download automatically.
|
||||
|
||||
Try walking through the following steps to get a tour of the system and understand more about:
|
||||
|
@ -47,7 +46,7 @@ Try walking through the following steps to get a tour of the system and understa
|
|||
* Downloading content for a different box
|
||||
* Managing collections on Internet Archive
|
||||
|
||||
or you can click `Home` or the Internet Archive logo,
|
||||
Or you can click `Home` or the Internet Archive logo,
|
||||
if you just want to explore the Internet Archive's resources.
|
||||
|
||||
## Using the page
|
||||
|
@ -64,17 +63,16 @@ while online, and don't appear when offline.
|
|||
Below that is a row of information specific to the offline application.
|
||||
|
||||
First are health indicators.
|
||||
|
||||
* If it shows "Mirror" in Red, it means we can't communicate with the mirror gateway,
|
||||
this will only happen if the gateway goes offline part way through a process.
|
||||
this will only happen if the gateway goes offline part way through a process.
|
||||
* Normally you'll see an indicator for GATEWAY, which is Green when the gateway can talk to the Archive,
|
||||
and Red when you are offline.
|
||||
|
||||
* Then comes an indicator for this page, whether it is being crawled, and if so approximately how much has been stored.
|
||||
|
||||
* If the mirror is online to the Internet Archive (GATEWAY shows Green) then next comes a "Reload" button,
|
||||
you can click this to force it to check with the Archive for an up to date list.
|
||||
It is most useful on collections when someone else might have added something,
|
||||
but your gateway might be remembering an old version.
|
||||
* If the mirror is online to the Internet Archive (GATEWAY shows Green), then next comes a "Reload" button,
|
||||
you can click this to force it to check with the Archive for an up to date list.
|
||||
It is most useful on collections when someone else might have added something,
|
||||
but your gateway might be remembering an old version.
|
||||
* Then there is a Settings button which brings up a page that includes status of any crawls.
|
||||
* Finally there is a Home button which will bring you back to this page.
|
||||
|
||||
|
@ -82,10 +80,11 @@ Each tile on this page represents an item that your server will check for when i
|
|||
The first time you access the server this will depend on what was installed on the server, and it might be empty.
|
||||
|
||||
Notice that most of the tiles should have a White, Green or Blue dot in the top right to indicate that you are crawling them.
|
||||
|
||||
* A White dot means the item has been downloaded and enough of it has been downloaded to be viewed offline.
|
||||
* The Green dot indicates that we are checking this item each time we crawl and getting enough to display offline.
|
||||
* A Blue dot indicates we are crawling all the content of the item, this could be a lot of data,
|
||||
for example a full resolution version of the video. Its rare that you’ll use this.
|
||||
for example a full resolution version of the video. Its rare that you’ll use this.
|
||||
|
||||
This button also shows how much has been downloaded, for an item its the total size of downloaded files/pages,
|
||||
for a collection its the total amount in all collection members.
|
||||
|
@ -104,6 +103,7 @@ and a total of 400Mb is downloaded in this collection. (Which includes some file
|
|||
|
||||
If you click on an item that is already downloaded (Blue, Green or White dot) then it will be displayed offline,
|
||||
the behavior depends on the kind of item.
|
||||
|
||||
* Images are displayed and saved for offline use
|
||||
* Books display in a flip book format, pages you look at will be saved for offline use.
|
||||
* Video and Audio will play immediately and you can skip around in them as normal
|
||||
|
@ -126,6 +126,7 @@ These should include any inserted drive with "archiveorg" as a directory at its
|
|||
The content will be copied to that drive, which can then be removed and inserted into a different server.
|
||||
|
||||
The server checks whether these disks are present every 15 seconds, so to use a new USB disk:
|
||||
|
||||
* Insert the USB
|
||||
* Create a folder at its top level called `archiveorg`
|
||||
* Wait about 15 seconds
|
||||
|
@ -152,7 +153,7 @@ This page is still under development (as of June 2019).
|
|||
|
||||
On here you will see a list of crawls.
|
||||
You should get useful information about status, any errors etc.
|
||||
Hitting `<<` will restart the crawl and `||` or `>' pause and resume,
|
||||
Hitting `<<` will restart the crawl and `||` or `>` pause and resume,
|
||||
but note that any file already being downloaded will continue to do so when you hit pause.
|
||||
Hitting `||` `<<` `<` will stop the current crawl, reset and retry, which is a good way to try again if,
|
||||
for example, you lost connection to the server part way through.
|
||||
|
@ -167,7 +168,7 @@ In a shell
|
|||
```
|
||||
sudo sh
|
||||
```
|
||||
cd into the location for your installation, on most platforms it is:
|
||||
cd into the location for your installation
|
||||
```
|
||||
cd /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-mirror
|
||||
```
|
||||
|
@ -175,7 +176,7 @@ Perform a standard crawl
|
|||
```
|
||||
./internetarchive --crawl
|
||||
```
|
||||
To fetch the "foobar" item from IA.
|
||||
To fetch the "foobar" item from IA
|
||||
```
|
||||
./internetarchive --crawl foobar
|
||||
```
|
||||
|
@ -194,11 +195,11 @@ To get a full list of possible arguments and some more examples
|
|||
If you have access to the command line on the server, then there is a lot more you can do with the crawler.
|
||||
|
||||
The items selected for crawling (Green or Blue dots) are stored in a file `dweb-mirror.config.yaml`
|
||||
in the one directory of the server, e.g. on IIAB its in /root/dweb-mirror.config.yaml
|
||||
and on your laptop its probably in ~/dweb-mirror.config.yaml.
|
||||
You can edit this file with care !
|
||||
in the one directory of the server, e.g. on IIAB its in `/root/dweb-mirror.config.yaml`
|
||||
and on your laptop its probably in `~/dweb-mirror.config.yaml`.
|
||||
You can edit this file with care!
|
||||
|
||||
From the command line, cd into the directory holding the service to run the crawler e.g. on iIAB
|
||||
From the command line, cd into the installation
|
||||
```
|
||||
cd /opt/iiab/internetarchive/node_modules/dweb-mirror
|
||||
./internetarchive --crawl
|
||||
|
@ -220,17 +221,20 @@ To put content onto a device, you can either:
|
|||
* hit `Save` while on an item or search
|
||||
* or run a crawl at the command line
|
||||
|
||||
cd into your device e.g. on an IIAB it would be
|
||||
```
|
||||
# CD into your device e.g. on an IIAB it would be
|
||||
cd /media/pi/foo
|
||||
|
||||
# Create a directory to use for the content, it must be called "archiveorg"
|
||||
```
|
||||
Create a directory to use for the content, it must be called "archiveorg"
|
||||
```
|
||||
mkdir archiveorg
|
||||
|
||||
# CD to the installation
|
||||
```
|
||||
cd to the installation
|
||||
```
|
||||
cd /opt/iiab/internetarchive/node_modules/dweb-mirror
|
||||
|
||||
# Copy the current crawl to the directory
|
||||
```
|
||||
Copy the current crawl to the directory
|
||||
```
|
||||
./internetarchive --crawl --copydirectory /media/foo/archiveorg
|
||||
```
|
||||
When its finished, you can unplug the USB drive and plug into any other device
|
||||
|
@ -247,11 +251,11 @@ and just checks the content is up to date.
|
|||
You can create and manage your own collections on the [Internet Archive site](http://www.archive.org).
|
||||
Other people can then crawl those collections.
|
||||
|
||||
First get in touch with Mitra Ardron at mitra@archive.org , as processes may have changed since this is written.
|
||||
First get in touch with Mitra Ardron at `mitra@archive.org`, as processes may have changed since this is written.
|
||||
|
||||
You'll need to create an account for yourself at [archive.org](https://archive.org)
|
||||
|
||||
We'll setup a collection for you of type "texts" - dont worry, you can put any kind of media in it.
|
||||
We'll setup a collection for you of type `texts` - dont worry, you can put any kind of media in it.
|
||||
|
||||
Once you have a collection, lets say `kenyanhistory`
|
||||
you can upload materials to the Archive by hitting the Upload button and following the instructions.
|
||||
|
@ -260,15 +264,15 @@ You can also add any existing material on the Internet Archive to this collectio
|
|||
|
||||
* Find the material you are looking for
|
||||
* You should see a URL like `https://archive.org/details/foobar`
|
||||
* Copy the identifier which in this case would be 'foobar'
|
||||
* Copy the identifier which in this case would be `foobar`
|
||||
* Go to `https://archive.org/services/simple-lists-admin/?identifier=kenyanhistory&list_name=items`
|
||||
replacing `kenyanhistory` with the name of your collection.
|
||||
* Enter the name of the item `foobar` into the box and click "Add".
|
||||
* Enter the name of the item `foobar` into the box and click `Add`.
|
||||
* It might take a few minutes to show up, you can add other items while you wait.
|
||||
* The details page for the collection should then show your new item `https://archive.org/details/kenyanhistory`
|
||||
|
||||
On the device, you can go to `kenyanhistory` and should see `foobar`.
|
||||
Hit Refresh and `foobar` should show up.
|
||||
Hit `Refresh` and `foobar` should show up.
|
||||
If `kenyanhistory` is marked for crawling it should update automatically
|
||||
|
||||
## Administration
|
||||
|
@ -281,7 +285,7 @@ Administration tools are under `Settings`.
|
|||
Click on the Archive logo, in the center-top, to get the Internet
|
||||
Archive main interface if connected to the net.
|
||||
|
||||
While viewing an item or collection, the "Crawl" button in the top bar
|
||||
While viewing an item or collection, the `Crawl` button in the top bar
|
||||
indicates whether the item is being crawled or not. Clicking it will cycle
|
||||
through three levels:
|
||||
|
||||
|
@ -310,7 +314,7 @@ directories found, but are read from any of the locations.
|
|||
|
||||
If you disk space is getting full, its perfectly safe to delete any
|
||||
subdirectories (except `archiveorg/.hashstore`), and the server will refetch anything else it needs
|
||||
next time youbrowse to the item while connected to the internet.
|
||||
next time you browse to the item while connected to the internet.
|
||||
Its also safe to move directories to an attached USB
|
||||
(underneath a `archiveorg` directory at the top level of the disk)
|
||||
It is also safe to move attached USB's from one device to another.
|
||||
|
@ -322,10 +326,13 @@ but most of it works now.
|
|||
|
||||
If you are worried about corruption, or after for example hand-editing or
|
||||
moving cached items around.
|
||||
|
||||
Run everything as root
|
||||
```
|
||||
# Run everything as root
|
||||
sudo su
|
||||
# cd into location for your installation
|
||||
```
|
||||
cd into location for your installation
|
||||
```
|
||||
cd /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-mirror
|
||||
./internetarchive -m
|
||||
```
|
||||
|
@ -343,9 +350,9 @@ the `/info` call.
|
|||
|
||||
In the Repo is a
|
||||
[default YAML file](https://github.com/internetarchive/dweb-mirror/blob/master/configDefaults.yaml)
|
||||
which is commented. It would be a bad idea to edit this, so I'm not going to
|
||||
tell you where it is on your installation! But anything from this file can be
|
||||
overridden by lines in `/root/dweb-mirror.config.yaml`. Make sure you
|
||||
which is commented. It would be a bad idea to edit this, so I'm not going to
|
||||
tell you where it is on your installation! But anything from this file can be
|
||||
overridden by lines in `/root/dweb-mirror.config.yaml`. Make sure you
|
||||
understand how yaml works before editing this file, if you break it, you can
|
||||
copy a new default from
|
||||
[dweb-mirror.config.yaml on the repo](https://github.com/internetarchive/dweb-mirror/blob/master/dweb-mirror.config.yaml)
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue