BrightOwl Loader Loading

Data Scientist - Belgium  

OntoForce (company)


Posted on : 17 January 2017

Project Description

 The Data Scientist is part of the data team. The data team is responsible for the semantic conversion, integration (ETL) and management of new and existing data sources. The Data Scientist will be responsible for the ETL process (extract, transform, and load) of new and existing data sources and the semantic configuration (linking and mapping) of all data sources using semantic web technologies (RDF (Resource Description Framework)), bash and python scripting. Also part of the job of the data engineer is to keep the data up-to-date by managing the automated data update pipeline.

The Data Scientist should have a profound understanding of the basics of the semantic web and linked data and domain knowledge in the life sciences field or express the confidence that s/he will become such an expert.
As the requirements of our products are at a very high level, requirements for the personnel are similar. Therefore he or she needs outstanding technical skills, understanding of systems used in the context of Big Data Integration, web development, linked data/semantic web, as well as good communication skills.
Your responsibilities:  
  • Reports to the Chief Data Officer 
  • Implement the ETL process of new data sources 
  • Keep data up-to-date 
  • Optimize our integration ontology 
  • Perform query optimization 
  • Semantic aggregation and linking of data 
  • Optimize, maintain and support our data storage solutions 
  • Work closely with the software architects and team members, and ensure up-to-date technical knowledge for yourself and for the team 
  • Be prepared to travel occasionally  

Your profile:  
  • Degree in computer science/informatics, engineering, life sciences (or equal by experience) 
  • At least 3 to 10 years of IT experienceProfound experience with scripting (e.g. Python) 
  • Profound experience with scripting (e.g. Python) 
  • Experience with Linux OS 
  • Experience with life sciences related data: drug development, omics, chemistry, medical, literature and economic data 
  • Proficient with distributed version control software (e.g. git, Github) 
  • Experience with ETL processes 
  • Knowledge of the semantic web. Experience with Agile methodology. Knowledge of RDF, relational and graph based database technologies: SQL, SPARQL, etc. is a plus 
  • Good knowledge of an ontology editor (e.g. Protégé, TopBraid composer) is a plus 
  • Experience with text mining technologies is a plus 
  • Knowledge of triplestore databases is a plus 
  • Experience with the configuration of various systems such as firewalls, databases, email servers, VoIP systems, gateways, proxy servers, … (the more the better) is a plus 
  • Experience with Linux system administration is a plus 
  • Other technical knowledge is an advantage  

Your personality:  
  • Result driven 
  • Eager to learn 
  • Passionate 
  • Good communicator 
  • Thinking in terms of solutions 
  • Focus on efficiency & effectiveness 
  • Think out-of-the-box while keeping the basics in mind 
  • Able to work in a multicultural and multi-site context 
  • Able to cope with the requirements of a startup environment. High pace of change, rethinking existing processes and technologies 
  • Able and wanting to work in an environment which offers a high degree of freedom, demands initiative and expects responsibility