mirror of
https://github.com/iiab/iiab.git
synced 2025-03-09 15:40:17 +00:00
internetarchive: Update README
This commit is contained in:
parent
b524d93fdc
commit
7ea25a3a63
1 changed files with 275 additions and 130 deletions
|
@ -1,71 +1,284 @@
|
||||||
# Internet Archive Offline / Universal Library / Decentralized Web README
|
# Offline Internet Archive
|
||||||
|
|
||||||
The Internet Archive (http://archive.org) is famous for their WayBack Machine
|
The Internet Archive offers perhaps the world’s largest online store of open content.
|
||||||
that has saved 384+ Billion web pages, and more recently their Decentralized
|
The wisdom of the ages, just a few clicks away. As Wikipedia has become the world’s encyclopedia,
|
||||||
Web project.
|
the Internet Archive has become its library.
|
||||||
|
Central to our mission is establishing “Universal Access to All Knowledge”.
|
||||||
|
Access to our library of millions of books, journals, audio and video recordings and beyond is free to anyone
|
||||||
|
|
||||||
This Ansible role installs the Internet Archive's dweb-mirror project on
|
This Ansible role installs the Internet Archive's dweb-mirror project on
|
||||||
Internet-in-a-Box (IIAB). Use this to build up a dynamic offline library
|
Internet-in-a-Box (IIAB). Use this to build up a dynamic offline library
|
||||||
arising from the materials you can explore at http://dweb.archive.org
|
arising from the materials you can explore at http://dweb.archive.org
|
||||||
|
|
||||||
The project is a local server that allows users to browse resources from the
|
The Offline Internet Archive server:
|
||||||
Internet Archive stored on local drives - including USB drives.
|
|
||||||
|
|
||||||
It includes a crawler that can regularly synchronize local collections, against
|
* Crawls Internet Archive collections to a local server.
|
||||||
a list of Internet Archive items and collections, and those collections can be
|
* Serves that content locally,
|
||||||
moved between installations.
|
* Caches content while browsing.
|
||||||
|
* Moves content between servers by sneakernet — on disks, USB sticks, and SD cards.
|
||||||
|
* Delivers (mostly) the Internet Archive UI offline in javascript in the browser,
|
||||||
|
* Is open source
|
||||||
|
* And is being made available in other languages.
|
||||||
|
|
||||||
When connected to the internet, the server works as a Proxy, i.e. it will store
|
## Starting server
|
||||||
Internet Archive (IA) content the user views for later off-line viewing.
|
|
||||||
|
|
||||||
There are components to integrate the IA server with decentralized tools
|
|
||||||
including IPFS, WebTorrent, GUN, WOLK, both for fetching content and for
|
|
||||||
serving it back to the net or locally.
|
|
||||||
|
|
||||||
This is an ongoing project, continually adding support for new Internet Archive
|
|
||||||
content types; new platforms; and new decentralized transports.
|
|
||||||
|
|
||||||
## Using it
|
|
||||||
|
|
||||||
### Starting server
|
|
||||||
|
|
||||||
The server is started and restarted automatically. It can be turned on or off
|
The server is started and restarted automatically. It can be turned on or off
|
||||||
at a terminal window with `service internetarchive start` or `service
|
at a terminal window with `service internetarchive start` or `service
|
||||||
internetarchive stop`
|
internetarchive stop`
|
||||||
|
|
||||||
### Browsing
|
## Browsing
|
||||||
|
|
||||||
The server can be accessed at [http://box:4244](http://box:4244) or
|
Open the web page at [http://box:4244](http://box:4244) or
|
||||||
[http://box.lan:4244](http://box.lan:4244) (try
|
[http://box.lan:4244](http://box.lan:4244) (try
|
||||||
[http://box.local:4244](http://box.local:4244) via mDNS over a local network,
|
[http://box.local:4244](http://box.local:4244) via mDNS over a local network,
|
||||||
if you don't have name resolution set up to reach your Internet-in-a-Box).
|
if you don't have name resolution set up to reach your Internet-in-a-Box).
|
||||||
|
|
||||||
_If future, we also hope to get [http://box/archive](http://box/archive) and
|
There are several aspects to managing content on the Internet Archive’s Universal Library which are covered below,
|
||||||
[http://box.lan/archive](http://box.lan/archive) working (as of 2019-05-25 the
|
these include crawling content to your own system , or to an external drive suitable for moving to another system,
|
||||||
error "Cannot GET /archive" appears — if you can help us fix
|
and managing a collection of material on the archive that others can download automatically.
|
||||||
[/etc/apache2/sites-available/internetarchive.conf](https://github.com/iiab/iiab/blob/master/roles/internetarchive/templates/internetarchive.conf)
|
|
||||||
that would be incredible!)_
|
|
||||||
|
|
||||||
If you don’t get an Archive UI then look at the server log (in browser console)
|
Try walking through the following steps to get a tour of the system and understand more about:
|
||||||
to see for any “FAILING” log lines which indicate a problem.
|
|
||||||
|
|
||||||
Expect to see errors in the Browser log for
|
* Using the interface
|
||||||
`http://localhost:5001/api/v0/version?stream-channels=true` which is checking
|
* Details page - viewing a single item
|
||||||
for a local IPFS server which is not started here.
|
* Collection and Search pages - multiple items
|
||||||
|
* Accessing Internet Archive resources
|
||||||
|
* Managing Crawling
|
||||||
|
* Downloading content for a different box
|
||||||
|
* Managing collections on Internet Archive
|
||||||
|
|
||||||
Expect, on slower machines or slower network connections, to see no images the
|
or you can click `Home` or the Internet Archive logo,
|
||||||
first time, refresh after a little while and most should appear.
|
if you just want to explore the Internet Archive's resources.
|
||||||
|
|
||||||
|
## Using the page
|
||||||
|
|
||||||
|
Whichever of the addresses above works it should bring you to your `local` start page.
|
||||||
|
You can get back here at any time, via the `Local` button.
|
||||||
|
|
||||||
|
If you have used the Internet Archive then the interface will be familiar,
|
||||||
|
but there are a few differences to support offline use.
|
||||||
|
|
||||||
|
At the top you'll see the Internet Archive's usual interface, a few of these buttons will (for now) only work
|
||||||
|
while online, and don't appear when offline.
|
||||||
|
|
||||||
|
Below that is a row of information specific to the offline application.
|
||||||
|
|
||||||
|
First are health indicators.
|
||||||
|
* If it shows "Mirror" in Red, it means we can't communicate with the mirror gateway,
|
||||||
|
this will only happen if the gateway goes offline part way through a process.
|
||||||
|
* Normally you'll see an indicator for GATEWAY, which is Green when the gateway can talk to the Archive,
|
||||||
|
and Red when you are offline.
|
||||||
|
|
||||||
|
* Then comes an indicator for this page, whether it is being crawled, and if so approximately how much has been stored.
|
||||||
|
|
||||||
|
* If the mirror is online to the Internet Archive (GATEWAY shows Green) then next comes a "Reload" button,
|
||||||
|
you can click this to force it to check with the Archive for an up to date list.
|
||||||
|
It is most useful on collections when someone else might have added something,
|
||||||
|
but your gateway might be remembering an old version.
|
||||||
|
* Then there is a Settings button which brings up a page that includes status of any crawls.
|
||||||
|
* Finally there is a Home button which will bring you back to this page.
|
||||||
|
|
||||||
|
Each tile on this page represents an item that your server will check for when it “crawls”.
|
||||||
|
The first time you access the server this will depend on what was installed on the server, and it might be empty.
|
||||||
|
|
||||||
|
Notice that most of the tiles should have a White, Green or Blue dot in the top right to indicate that you are crawling them.
|
||||||
|
* A White dot means the item has been downloaded and enough of it has been downloaded to be viewed offline.
|
||||||
|
* The Green dot indicates that we are checking this item each time we crawl and getting enough to display offline.
|
||||||
|
* A Blue dot indicates we are crawling all the content of the item, this could be a lot of data,
|
||||||
|
for example a full resolution version of the video. Its rare that you’ll use this.
|
||||||
|
|
||||||
|
This button also shows how much has been downloaded, for an item its the total size of downloaded files/pages,
|
||||||
|
for a collection its the total amount in all collection members.
|
||||||
|
|
||||||
|
Tiles come in two types, most shows items that can be displayed - books, videos, audio etc,
|
||||||
|
clicking on these will display the item.
|
||||||
|
|
||||||
|
Some of the tiles will show a collection which is a group of items that someone has collected together,
|
||||||
|
most likely there will be at least one collection relevant to your project put on the page during installation.
|
||||||
|
|
||||||
|
It shows you how many items are in the collection and how many have been downloaded
|
||||||
|
e.g. 400Mb in 10 of 123 items, means 10 of the 123 items in the collection are downloaded sufficient to view offline,
|
||||||
|
and a total of 400Mb is downloaded in this collection. (Which includes some files, like thumbnails, in other items).
|
||||||
|
|
||||||
|
## Details page - viewing a single item
|
||||||
|
|
||||||
|
If you click on an item that is already downloaded (Blue, Green or White dot) then it will be displayed offline,
|
||||||
|
the behavior depends on the kind of item.
|
||||||
|
* Images are displayed and saved for offline use
|
||||||
|
* Books display in a flip book format, pages you look at will be saved for offline use.
|
||||||
|
* Video and Audio will play immediately and you can skip around in them as normal
|
||||||
|
|
||||||
|
The crawl button at the top will indicate whether the object is being crawled and if not, whether it has been downloaded,
|
||||||
|
in the same way tiles do, and also show you (approximately) the total downloaded for this item.
|
||||||
|
|
||||||
|
Click on the Crawl button till it turns Green and it will download a full copy of the book, video or audio.
|
||||||
|
It waits about 30 seconds to do this, allowing time to cycle back to the desired level of crawling.
|
||||||
|
These items will also appear on your Local page.
|
||||||
|
See the note above, usually you won’t want to leave it at yellow (all) as this will usually try
|
||||||
|
(there are some size limits) to download all the files.
|
||||||
|
|
||||||
|
There is a Reload button which will force the server to try archive.org,
|
||||||
|
this is useful if you think the item has changed, or for debugging.
|
||||||
|
|
||||||
|
If you want to Save this item to a specific disk, for example to put it on a USB-drive then click the Save button.
|
||||||
|
This button brings up a dialogue with a list of the available destinations.
|
||||||
|
These should include any inserted drive with "archiveorg" as a directory at its top level.
|
||||||
|
The content will be copied to that drive, which can then be removed and inserted into a different server.
|
||||||
|
|
||||||
|
The server checks whether these disks are present every 15 seconds, so to use a new USB disk:
|
||||||
|
* Insert the USB
|
||||||
|
* Create a folder at its top level called `archiveorg`
|
||||||
|
* Wait about 15 seconds
|
||||||
|
* Reload the page you are on
|
||||||
|
* Hitting `Save` should now allow this USB disk to be selected.
|
||||||
|
|
||||||
|
## Collection and Search pages - multiple items
|
||||||
|
|
||||||
|
If you click on a Collection, then we’ll display a grid of tiles for all the items that have been placed in the collection.
|
||||||
|
White, Green and Blue indicators mean the same as on the Local page.
|
||||||
|
If you click on the crawl button till its Green then it will check this collection each time it crawls,
|
||||||
|
download the tiles for the first page or so, and can be configured to get some of the items as well
|
||||||
|
|
||||||
|
## Accessing Internet Archive resources
|
||||||
|
|
||||||
|
The Internet Archive logo tile on the local page will take you to the Archive front page collection,
|
||||||
|
content here is probably not already downloaded or crawled,
|
||||||
|
but can be selected for crawling as for any other item.
|
||||||
|
|
||||||
|
## Managing crawling
|
||||||
|
|
||||||
|
If you click on the "Settings" button, it should bring up a page of settings to control Crawling.
|
||||||
|
This page is still under development (as of June 2019).
|
||||||
|
|
||||||
|
On here you will see a list of crawls.
|
||||||
|
You should get useful information about status, any errors etc.
|
||||||
|
Hitting `<<` will restart the crawl and `||` or `>' pause and resume,
|
||||||
|
but note that any file already being downloaded will continue to do so when you hit pause.
|
||||||
|
Hitting `||` `<<` `<` will stop the current crawl, reset and retry, which is a good way to try again if,
|
||||||
|
for example, you lost connection to the server part way through.
|
||||||
|
|
||||||
|
## Crawling
|
||||||
|
|
||||||
|
The Crawler runs automatically at startup and when you add something to the crawl,
|
||||||
|
but it can also be configurable through the YAML file described above
|
||||||
|
or run at a command line for access to more functionality.
|
||||||
|
|
||||||
|
In a shell
|
||||||
|
```
|
||||||
|
sudo sh
|
||||||
|
```
|
||||||
|
cd into the location for your installation, on most platforms it is:
|
||||||
|
```
|
||||||
|
cd /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-mirror
|
||||||
|
```
|
||||||
|
Perform a standard crawl
|
||||||
|
```
|
||||||
|
./internetarchive --crawl
|
||||||
|
```
|
||||||
|
To fetch the "foobar" item from IA.
|
||||||
|
```
|
||||||
|
./internetarchive --crawl foobar
|
||||||
|
```
|
||||||
|
To crawl top 10 items in the prelinger collection sufficiently to display and put
|
||||||
|
them on a disk plugged into the /media/pi/xyz.
|
||||||
|
```
|
||||||
|
./internetarchive --copydirectory /media/pi/xyz/archiveorg --crawl --rows 10 --level details prelinger
|
||||||
|
```
|
||||||
|
To get a full list of possible arguments and some more examples
|
||||||
|
```
|
||||||
|
./internetarchive --help
|
||||||
|
```
|
||||||
|
|
||||||
|
### Advanced crawling
|
||||||
|
|
||||||
|
If you have access to the command line on the server, then there is a lot more you can do with the crawler.
|
||||||
|
|
||||||
|
The items selected for crawling (Green or Blue dots) are stored in a file `dweb-mirror.config.yaml`
|
||||||
|
in the one directory of the server, e.g. on IIAB its in /root/dweb-mirror.config.yaml
|
||||||
|
and on your laptop its probably in ~/dweb-mirror.config.yaml.
|
||||||
|
You can edit this file with care !
|
||||||
|
|
||||||
|
From the command line, cd into the directory holding the service to run the crawler e.g. on iIAB
|
||||||
|
```
|
||||||
|
cd /opt/iiab/internetarchive/node_modules/dweb-mirror
|
||||||
|
./internetarchive --crawl
|
||||||
|
```
|
||||||
|
There are lots of options possible, try `./internetarchive —help` to get guidance.
|
||||||
|
|
||||||
|
This functionality will be gradually added to the UI in future releases.
|
||||||
|
In the meantime if you have something specific you want to do feel free to post it as a new issue on
|
||||||
|
[github](https://github.com/dweb-mirror/issues/new).
|
||||||
|
|
||||||
|
## Downloading content for a different box
|
||||||
|
|
||||||
|
You can copy one or more items that are downloaded to a new storage device (e.g. a USB drive),
|
||||||
|
take that device to another Universal Library server, and plug it in.
|
||||||
|
All the content will appear as if it was downloaded there.
|
||||||
|
|
||||||
|
To put content onto a device, you can either:
|
||||||
|
* put the `copydirectory` field in the yaml file described above,
|
||||||
|
* hit `Save` while on an item or search
|
||||||
|
* or run a crawl at the command line
|
||||||
|
|
||||||
|
```
|
||||||
|
# CD into your device e.g. on an IIAB it would be
|
||||||
|
cd /media/pi/foo
|
||||||
|
|
||||||
|
# Create a directory to use for the content, it must be called "archiveorg"
|
||||||
|
mkdir archiveorg
|
||||||
|
|
||||||
|
# CD to the installation
|
||||||
|
cd /opt/iiab/internetarchive/node_modules/dweb-mirror
|
||||||
|
|
||||||
|
# Copy the current crawl to the directory
|
||||||
|
./internetarchive --crawl --copydirectory /media/foo/archiveorg
|
||||||
|
```
|
||||||
|
When its finished, you can unplug the USB drive and plug into any other device
|
||||||
|
|
||||||
|
Alternatively if you want to crawl a specific collection e.g. `frenchhistory` to the drive, you would use:
|
||||||
|
```
|
||||||
|
./internetarchive --crawl --copydirectory /media/foo/archiveorg frenchhistory
|
||||||
|
```
|
||||||
|
If you already have this content on your own device, then the crawl is quick,
|
||||||
|
and just checks the content is up to date.
|
||||||
|
|
||||||
|
## Managing collections on Internet Archive
|
||||||
|
|
||||||
|
You can create and manage your own collections on the [Internet Archive site](http://www.archive.org).
|
||||||
|
Other people can then crawl those collections.
|
||||||
|
|
||||||
|
First get in touch with Mitra Ardron at mitra@archive.org , as processes may have changed since this is written.
|
||||||
|
|
||||||
|
You'll need to create an account for yourself at [archive.org](https://archive.org)
|
||||||
|
|
||||||
|
We'll setup a collection for you of type "texts" - dont worry, you can put any kind of media in it.
|
||||||
|
|
||||||
|
Once you have a collection, lets say `kenyanhistory`
|
||||||
|
you can upload materials to the Archive by hitting the Upload button and following the instructions.
|
||||||
|
|
||||||
|
You can also add any existing material on the Internet Archive to this collection.
|
||||||
|
|
||||||
|
* Find the material you are looking for
|
||||||
|
* You should see a URL like `https://archive.org/details/foobar`
|
||||||
|
* Copy the identifier which in this case would be 'foobar'
|
||||||
|
* Go to `https://archive.org/services/simple-lists-admin/?identifier=kenyanhistory&list_name=items`
|
||||||
|
replacing `kenyanhistory` with the name of your collection.
|
||||||
|
* Enter the name of the item `foobar` into the box and click "Add".
|
||||||
|
* It might take a few minutes to show up, you can add other items while you wait.
|
||||||
|
* The details page for the collection should then show your new item `https://archive.org/details/kenyanhistory`
|
||||||
|
|
||||||
|
On the device, you can go to `kenyanhistory` and should see `foobar`.
|
||||||
|
Hit Refresh and `foobar` should show up.
|
||||||
|
If `kenyanhistory` is marked for crawling it should update automatically
|
||||||
|
|
||||||
## Administration
|
## Administration
|
||||||
|
|
||||||
Administration is carried out mostly through the same User Interface as browsing.
|
Administration is carried out mostly through the same User Interface as browsing.
|
||||||
|
|
||||||
Access [http://box.lan:4244/local](http://box.lan:4244/local) to see a
|
Select `local` from any of the pages to access a display of local content.
|
||||||
display of local content, this interface is under development and various admin
|
Administration tools are under `Settings`.
|
||||||
tools will be added here. Unless your box has been configured differently this
|
|
||||||
should also be the page you get at [http://box.lan:4244/local](http://box.lan:4244/local).
|
|
||||||
|
|
||||||
Access [http://box.lan:4244/home](http://box.lan:4244/home) to get the Internet
|
Click on the Archive logo, in the center-top, to get the Internet
|
||||||
Archive main interface if connected to the net.
|
Archive main interface if connected to the net.
|
||||||
|
|
||||||
While viewing an item or collection, the "Crawl" button in the top bar
|
While viewing an item or collection, the "Crawl" button in the top bar
|
||||||
|
@ -79,29 +292,31 @@ through three levels:
|
||||||
* Full - crawls everything on the item, this can be a LOT of data, including
|
* Full - crawls everything on the item, this can be a LOT of data, including
|
||||||
full size videos etc, so use with care if bandwidth/disk is limited.
|
full size videos etc, so use with care if bandwidth/disk is limited.
|
||||||
|
|
||||||
### Disks
|
### Disk storage
|
||||||
|
|
||||||
The server checks for caches of content in directories called `archiveorg` in
|
The server checks for caches of content in directories called `archiveorg` in
|
||||||
all the likely places, in particular it looks in `/media/pi/*archiveorg` for
|
all the likely places, in particular it looks for any inserted USB drives
|
||||||
any inserted USB drives, and if none are found, it uses `/library/archiveorg`.
|
on most systems, and if none are found, it uses `~/archiveorg`.
|
||||||
|
|
||||||
The list of places it checks, in an unmodified installation can be seen at
|
The list of places it checks, in an unmodified installation can be seen at
|
||||||
`https://github.com/internetarchive/dweb-mirror/blob/master/configDefaults.yaml#L7`.
|
`https://github.com/internetarchive/dweb-mirror/blob/master/configDefaults.yaml#L7`.
|
||||||
|
|
||||||
You can override this in `dweb-mirror.config.yaml` in the home directory of the
|
You can override this in `dweb-mirror.config.yaml` in the home directory of the
|
||||||
user that runs the server, this is currently `/root/dweb-mirror.config.yaml`
|
user that runs the server. (Note on IIAB this is currently in `/root/dweb-mirror.config.yaml`)
|
||||||
(see 'Advanced' below)
|
(see 'Advanced' below)
|
||||||
|
|
||||||
Archive's `Items` are stored in subdirectories of the first of these
|
Archive's `Items` are stored in subdirectories of the first of these
|
||||||
directories found, but are read from any of the locations.
|
directories found, but are read from any of the locations.
|
||||||
|
|
||||||
If you disk space is getting full, its perfectly safe to delete any
|
If you disk space is getting full, its perfectly safe to delete any
|
||||||
subdirectories, or to move them to an attached USB. Its also safe to move
|
subdirectories (except `archiveorg/.hashstore`), and the server will refetch anything else it needs
|
||||||
attached USB's from one device to another.
|
next time youbrowse to the item while connected to the internet.
|
||||||
|
Its also safe to move directories to an attached USB
|
||||||
|
(underneath a `archiveorg` directory at the top level of the disk)
|
||||||
|
It is also safe to move attached USB's from one device to another.
|
||||||
|
|
||||||
The one directory you should not move or delete is `archiveorg/.hashstore` in
|
Some of this functionality for handling disks is still under active development,
|
||||||
any of these locations, the server will refetch anything else it needs if you
|
but most of it works now.
|
||||||
browse to the item again when connected to the internet.
|
|
||||||
|
|
||||||
### Maintenance
|
### Maintenance
|
||||||
|
|
||||||
|
@ -109,7 +324,7 @@ If you are worried about corruption, or after for example hand-editing or
|
||||||
moving cached items around.
|
moving cached items around.
|
||||||
```
|
```
|
||||||
# Run everything as root
|
# Run everything as root
|
||||||
sudo sh
|
sudo su
|
||||||
# cd into location for your installation
|
# cd into location for your installation
|
||||||
cd /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-mirror
|
cd /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-mirror
|
||||||
./internetarchive -m
|
./internetarchive -m
|
||||||
|
@ -122,11 +337,7 @@ cached, just to rebuild a table of checksums.
|
||||||
Most functionality of the tool is controlled by two YAML files, the second of
|
Most functionality of the tool is controlled by two YAML files, the second of
|
||||||
which you can edit if you have access to the shell.
|
which you can edit if you have access to the shell.
|
||||||
|
|
||||||
You can view the current configuration by going to
|
You can view the current configuration by going to `/info` on your server.
|
||||||
[http://box.lan:4244/info](http://box.lan:4244/info) or
|
|
||||||
[http://localhost:4244/info](http://localhost:4244/info) depending on how you
|
|
||||||
are connected.
|
|
||||||
|
|
||||||
The default, and user configurations are displayed as the `0` and `1` item in
|
The default, and user configurations are displayed as the `0` and `1` item in
|
||||||
the `/info` call.
|
the `/info` call.
|
||||||
|
|
||||||
|
@ -139,86 +350,20 @@ understand how yaml works before editing this file, if you break it, you can
|
||||||
copy a new default from
|
copy a new default from
|
||||||
[dweb-mirror.config.yaml on the repo](https://github.com/internetarchive/dweb-mirror/blob/master/dweb-mirror.config.yaml)
|
[dweb-mirror.config.yaml on the repo](https://github.com/internetarchive/dweb-mirror/blob/master/dweb-mirror.config.yaml)
|
||||||
|
|
||||||
TODO Note this file will probably move location.
|
|
||||||
|
|
||||||
Note that this file is also edited automatically when the Crawl button
|
Note that this file is also edited automatically when the Crawl button
|
||||||
described above is clicked.
|
described above is clicked.
|
||||||
|
|
||||||
As the project develops, this file will be editable via a UI.
|
As the project develops, this file will be more and more editable via a UI.
|
||||||
|
|
||||||
## Update
|
|
||||||
|
|
||||||
Dweb-mirror is under rapid development, as is the JavaScript UI. It's
|
|
||||||
recommended to update frequently.
|
|
||||||
|
|
||||||
From a Terminal window
|
|
||||||
```
|
|
||||||
sudo sh # Run all commands as root
|
|
||||||
cd /opt/iiab/internetarchive
|
|
||||||
yarn upgrade # Currently this can take up to about 20 minutes to run, we hope to reduce that time
|
|
||||||
```
|
|
||||||
|
|
||||||
## Crawling
|
|
||||||
|
|
||||||
The Crawler will be built into the UI fairly soon, for now it has to be run in
|
|
||||||
a terminal window.
|
|
||||||
|
|
||||||
Its highly configurable either through the YAML file described above, or from
|
|
||||||
the command line.
|
|
||||||
|
|
||||||
In a shell
|
|
||||||
```
|
|
||||||
# Run all commands as root from dweb-mirror's directory
|
|
||||||
sudo sh
|
|
||||||
|
|
||||||
# cd into location for your installation
|
|
||||||
cd /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-mirror
|
|
||||||
|
|
||||||
# To get a full list of possible arguments
|
|
||||||
./internetarchive --help
|
|
||||||
|
|
||||||
# Perform a standard crawl
|
|
||||||
./internetarchive --crawl
|
|
||||||
|
|
||||||
# To fetch the "foobar" item from IA.
|
|
||||||
./internetarchive --crawl foobar
|
|
||||||
|
|
||||||
# To crawl top 10 items in the prelinger collection sufficiently to display and put
|
|
||||||
# them on a disk plugged into the /media/pi/xyz
|
|
||||||
# TODO check where pi actually put them.
|
|
||||||
./internetarchive --copydirectory /media/pi/xyz/archiveorg --crawl --rows 10 --level details prelinger
|
|
||||||
```
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
There are two logs of relevance, the browser and the server.
|
|
||||||
|
|
||||||
**Browser**: If using Chrome then this is at View / Developer Tools /
|
|
||||||
JavaScript Console or something similar.
|
|
||||||
|
|
||||||
**Server**:
|
|
||||||
From a Terminal window.
|
|
||||||
```
|
|
||||||
journalctl -u internetarchive
|
|
||||||
```
|
|
||||||
|
|
||||||
## Known Issues
|
|
||||||
|
|
||||||
See
|
|
||||||
[github dweb-mirror issues](https://github.com/internetarchive/dweb-mirror/issues);
|
|
||||||
and
|
|
||||||
[github dweb-archive issues](https://github.com/internetarchive/dweb-archive/issues);
|
|
||||||
|
|
||||||
## More info
|
## More info
|
||||||
|
|
||||||
Dweb-Mirror lives on GitHub at:
|
Dweb-Mirror lives on GitHub at:
|
||||||
* [dweb-mirror](https://github.com/internetarchive/dweb-mirror)
|
* dweb-mirror (the server) [source](https://github.com/internetarchive/dweb-mirror),
|
||||||
* [source](https://github.com/internetarchive/dweb-mirror)
|
and [issues tracker](https://github.com/internetarchive/dweb-mirror/issues)
|
||||||
* [issues](https://github.com/internetarchive/dweb-mirror/issues)
|
* dweb-archive (the UI) [source](https://github.com/internetarchive/dweb-archive),
|
||||||
* [API.md](./API.md) API documentation for dweb-mirror
|
and [issues tracker](https://github.com/internetarchive/dweb-archive/issues)
|
||||||
|
|
||||||
This project is part of the Internet Archive's larger Dweb project, see also:
|
This project is part of the Internet Archive's larger Dweb project, see also:
|
||||||
* [dweb-universal](https://github.com/internetarchive/dweb-universal) info about others distributing the web
|
* [dweb-universal](https://github.com/mitra42/dweb-universal) info about others working to bring access offline.
|
||||||
* [dweb-transport](https://github.com/internetarchive/dweb-transport) miscellaneous incl GUN gateway and WebTorrent
|
* [dweb-transports](https://github.com/internetarchive/dweb-transports) for our transport library to IPFS, WEBTORRENT, WOLK, GUN etc
|
||||||
* [dweb-objects](https://github.com/internetarchive/dweb-objects) library of dweb objects
|
* [dweb-archivecontroller](https://github.com/internetarchive/dweb-archivecontroller) for an object oriented wrapper around our APIs
|
||||||
* [dweb-archive](https://github.com/internetarchive/dweb-archive) archive UI in JavaScript
|
|
||||||
* [dweb-archivecontroller](https://github.com/internetarchive/dweb-archive) Knows about the structure of archive objects
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue