I'm working on a project to crawl and index a large amount of online content. The crawled data is feed into an index and queried for different specialized topics.
Now this project is already running since some months, the prospect is great and we're looking forward to
That's where you come into play. There are many areas, where you could take over some work: We're using Hadoop, HBase and Zookeeper. Maybe we could make use of some parts of Bixo or Nutch. The Lucene Index needs to be scaled, either with Katta, elasticsearch or some
homegrown solution on top of HBase. We'll also need to scale the number of servers, so you could help with some background in system administration and configuration management.
What's cool about the job?
- First, you'll work with me.
- You will work with interesting technology: Big Data, Crawling, Search
- YMC is a nice place, where I can combine intern projects with contributions to debian
- Lake constance is one of the nicest touristic places in Europe
- Allthough it's a touristic region, you don't have to miss Free Software events: It's one hour to Zurich, two and a half to Stuttgart and we run a local web-dev group with around 20 members. Night trains from Zurich take you to Cologne, Berlin or Vienna.
If all this sounds good for you, then have a look at
www.ymc.ch/unternehmen/jobs!