When you’re running websites or any other service you want to know its state at any time and be informed when something goes wrong. For example, I started molescrape almost one year ago and during long-term website crawling you need some form of application monitoring. During the first few days, you might check the status of the crawling manually, but some weeks later you won’t. And at some point in time your application will fail. And you will need to notice, or you will go days or even weeks without noticing. In this article I will give a short overview over monitoring solutions (especially simple ones for hobby projects) and introduce my own monitoring solution.
At my scraping platform molescrape.com, I have setup a constant monitoring of the number of items collected per spider to detect when a spider fails (e.g. because of system problems or because the website changed). Currently, the threshold has to be set manually for each newly added spider. As this is increased effort for the user, I have been working on a system to automatically detect a useful threshold.
The most commonly used mining program for mining Monero at the moment is xmr-stak. Some time ago it was split into separate projects for
xmr-stak-gpu, but now they are both combined into
When dealing with messaging systems there are a lot of options available from classical message brokers to simple libraries that handle the messaging logic without a central server. Almost all of them have some differences and each of them has a reason to exist. In this article I will compare a few popular ones and very different ones, namely the message broker RabbitMQ, the distributed streaming platform Kafka, the socket and concurrency library ZeroMQ and the lightweight MQTT broker Mosquitto. You will see that each of them has their own advantages and differences from the others and you should choose one according to your needs.
Ein Crawler ist im Grunde sehr simpel und schnell programmiert. Wir senden eine Anfrage für eine bestimmte URL an einen Server und warten auf die Antwort. Die Antwort speicher wir ab und fertig ist der erste Request. Jetzt extrahieren wir nur noch die Links aus der Seite, senden neue Requests und fertig ist der Crawler.
subscribe via RSS