• Using different hosters for domain and subdomain

    Recently for an active project I wanted to link the main domain sprakit.com to one hoster and a subdomain beta.sprakit.com to another hoster. Technically, this is not a big deal, but it’s simpler with a basic understanding of the DNS record and its entries.

  • Croatian POS Tagging

    POS tagging in not so common languages usually requires a bit of effort to be set up. Luckily, for Croatian, Željko Agić has created a very good POS tagger licensed under CC-BY-SA-3.0. It is based on the hunpos package which was originally created for Hungarian and which is licensed under the New BSD License.

  • Analyzing the Common Crawl using Map-Reduce

    Let’s analyze some real data using Map-Reduce. Common Crawl is a web crawl of the entire web by a non profit organization (but they seem to have some sponsors to pay for resources and they’re even hiring employees). Their datasets are provided in a public S3 bucket for free to the downloader. We will analyze the data using Hadoop (in my case on Amazon’s EMR). At first I tried to use Disco, but it caused a lot of effort and some day I got stuck with a problem to hard to invest more time.

  • Training your custom classifier in Tensorflow Inception image recognition

    Just some months ago, Google released code for classifying images using neural networks. Some time later, they also released code to train your custom models, either from scratch or improving a baseline model. The baseline model in that case usually is a model trained on the ImageNet dataset.

  • Using Map-Reduce on Graphs

    Map-Reduce seems to be the standard technology for working with large amounts of data these days. It is most well-known in combination with simple flat, table-like structures, maybe because most beginner tutorials focus on these.

subscribe via RSS