Developer(s) | Plantaest |
---|---|
Initial release | June 22, 2024 |
Stable release | 0.1.0-alpha.1
|
Repository | feverfew on GitHub |
Written in | Java, TypeScript, Python |
Platform | Toolforge |
Available in | Multi-language |
Type | Link checker tool |
License | AGPL-3.0 |
Website | Feverfew |
Feverfew is a link checker tool deployed on Toolforge, developed by Plantaest.
Usage
editChecking an article
editTo start using Feverfew, follow these steps:
First, visit the Feverfew homepage at the address: https://feverfew.toolforge.org/.
Next, once the website interface appears, you can begin using it.
Users need to select a wiki containing the article they want to check, such as English Wikipedia, which has the code enwiki. Then, enter the title of the article and press the Check button for the application to start checking the links in the article.
After waiting for 2–30 seconds, Feverfew should complete the check and return the result.
Reading the result
editThe result page interface consists of three main sections:
- The header section provides some information about the page being checked, such as the page title, page ID, wiki ID, and the time the check was performed. Additionally, there are two buttons on the top right:
- The first button, which looks like a link , directs to the archived result of the check. This URL follows the format:
https://feverfew.toolforge.org/check/archive/{check_id}
. This link can be shared on Wikipedia to inform others about the status of the article's links. - The second button, which looks like an external link , directs to the revision of the page at the time of the check.
- The first button, which looks like a link , directs to the archived result of the check. This URL follows the format:
- The middle section features four colored boxes:
- Blue Box: Shows the total number of links extracted from the article's source code.
- Gray Box: Indicates the number of links that the application ignored during the check.
- Green Box: Displays the number of links that the machine learning model evaluated as working.
- Red Box: Displays the number of links that the machine learning model evaluated as broken.
- The final section is a detailed list of each link's results, containing the following information:
- Index number: Along with the index number of the link, there might be the index number of the reference.
- Probability score: Indicates the likelihood of the link being broken, expressed as a percentage. Links with a score below 50% have a green background, while those above 50% have a red background.
- Hostname: The name of the server hosting the link.
- HTTP status: This can be 200 (if the page loads successfully) or 404 (if the page returns a not found error). See more: HTTP response status codes.
- Load time: The time it took to load the link, measured in milliseconds.
- Page size: The size of the page, measured in bytes.
- Reference name: If the link is part of a reference, its name will be included. References with odd index numbers have a purple background, while those with even index numbers have an orange background.
- Number of redirects: If any.
- Link text: The text of the link, with a copy button next to it.
- Bare link: The raw URL, with a copy button next to it.
Review feature
editTo enable the review feature, click the eyeglass icon in the bottom right corner. After clicking, a panel will appear. This panel consists of two columns: the left column contains an embedded frame of the linked website, and the right column displays wikitext content with highlighted links.
For navigation, users can use the mouse to click on individual links or use the following keyboard shortcuts:
- Q: Scroll to the selected link
- A: Select the previous link
- Z: Select the next link
Sometimes, it might not be possible to open the website in the embedded frame. This could be because the website blocks the iframe
feature of the browser. In such cases, users will need to access the website directly through the browser to view its content.
Viewing the results list
editTo view the results list, go to the homepage and click on the Result menu; or directly access the link: https://feverfew.toolforge.org/check.
Viewing a result
editTo view a result, you can browse the results list and select a result from the list; or directly access the URLs in the format: https://feverfew.toolforge.org/check/archive/{check_id}
.
Other features
editUsers can change the interface color scheme, text reading direction, and language using the three buttons in the top right corner.
Feverfew and InternetArchiveBot
editFeverfew does not aim to replace InternetArchiveBot. Both tools can be used simultaneously to support checking and archiving links in articles. A reasonable usage approach might be:
- First, use Feverfew to conduct a preliminary check of the article's links.
- Next, use InternetArchiveBot to archive a portion (only the dead links) or all links (including both dead and currently live links).
- Then, use Feverfew again to assess the status after the links have been archived.
Misclassification
editSince Feverfew uses a machine learning model, errors in evaluation can occur in some cases, meaning it might misclassify active links as broken and vice versa. According to training data, this model achieves an accuracy of 0.82 and an F1 score of 0.80. In general terms, this means the model correctly evaluates 82 out of 100 links, while the remaining 18 links might be hit or miss :)
Users can utilize additional information, such as the HTTP status in the result, to draw their own conclusions about the link's status.
Software errors
editCurrently, several issues may arise when using the software:
- If you enter a title that does not exist on the selected wiki, the content of the page cannot be retrieved, and therefore the check cannot be initiated.
- There may be errors when the check takes an unusually long time to complete, even though the check has been completed and archived. The timeout for checking links is set at 25 seconds, so if it takes more than a few minutes, an error has likely occurred.
- Errors may occur if too many check requests are sent within a certain time frame. Currently, the software only allows up to 100 checks per day for each anonymous user session.
- Errors may arise due to the instability of the Toolforge server.
- The index numbers of references may not be accurate.
- Feverfew may not be able to access certain websites, for instance, if the website blocks requests from Amazon servers.
Origin
editThe idea for Feverfew originated from a software tool that wiki communities used in the past to evaluate links, called Checklinks, created by Dispenser (English Wikipedia). However, this software has become non-functional since the author has been absent since 2020.
Feverfew retains the basic features of Checklinks and is likely not to implement additional features to keep the system simple, especially since InternetArchiveBot currently performs well in supporting link archiving.
The foundation for the Feverfew project came from a discussion in 2021 on Vietnamese Wikipedia: Công cụ check link mới (New link-checking tool).
Stable version
editCurrently, the Feverfew project is in the experimental stage, and it may take quite some time to reach the first stable version, 1.0.0. During this period, the project will continue to gather feedback from users across various wikis to improve and fix any potential issues.
Security
editFeverfew does not store any personal information, except for a randomly generated UUID (Universally Unique Identifier) that is hashed using the CRC32 algorithm into a 32-bit integer, with a lifespan of 30 days. This identifier is used to limit the number of checks within the allowed quota and for retrieval purposes if necessary.
Source code
editThe source code is stored on GitHub: https://github.com/plantaest/feverfew. Those interested and with a GitHub account can star the repository to show support. Currently, the project does not have specific guidelines for contributing code, so this will be encouraged at a later appropriate time.
The project architecture includes the following components:
- Front-end is a React application written in TypeScript, with notable libraries such as Mantine, React Query, Legend State, i18next, React Router, Valibot, and built by Vite.
- Back-end is a Quarkus application written in Java, with additional libraries including TSID, Unirest, Jsoup, ONNX Runtime, Bucket4j.
- External server is Caddy Server, serving static files of the front-end and acting as a reverse proxy for the back-end.
- Both Caddy and Quarkus servers run on a Kubernetes pod via Toolforge's internal
webservice
, configured with 3 CPUs, 6 GB RAM, 2 replicas, and running Debian OS. The Quarkus application on Toolforge runs on the JVM. - AWS Lambda Function is a Quarkus application performing link request creation tasks, deployed on AWS with 4 instances in the same region, memory limit of 512 MB, and a timeout of 30 seconds. The Quarkus application on AWS Lambda is a native image created by GraalVM CE.
- Machine learning model is designed and built using scikit-learn in Python, and converted to ONNX format using the skl2onnx library.