About the Company:
CrowdANALYTIX is a crowdsourced analytics service focused on partnering with life sciences and professional services firms globally. CrowdANALYTIX operates a platform in which a large community of independent analytical experts solve business problems by competing in data science competitions.
CrowdANALYTIX currently has data scientists on its platform from 50 countries, many with PhDs and Masters in Statistics and Machine Learning. CrowdANALYTIX is backed by Accel Partners and SAIF Partners and is based out of Silicon Valley, California.
Desired Experience: 1-4 years
Salary: INR 3 – 6 LPA
Tentative date of Interview: will be communicated post registration window
Job Description :
-Development of large-scale real-time web data crawling system and storage platform The data could be from reviews, blogs, product catalogs, social sites, travel data- basically anything and everything that's publicly available. For the crawling of HTML or XML files, be able to use wrapper applications.
-Convert the extracted unstructured web data, convert them in to the structured data using tools such as Apache Spark or different DBs (MongoDB, HDFS)
-Building an API for product and pricing intelligence and indexing each of the fields of crawled data for a comparison for text-searching from the large datasets
Skills Set Required :
-Managing multiple servers (several EC2 / other servers); Would include maintaining the system health, monitoring, upgrading and patching softwares, writing scripts to automate day to day tasks and scaling the infrastructure as per the requirements.
-Systems are mainly Linux/Unix based and the other tools / databases could be varied.
-Face to Face Interview
B.Sc., B.Tech/B.E., BCA, M.Tech./M.E.
1 - 4 Years
3 - 6 LPA