Big Data Denmark is a community of data scientists, software developers and analysts that meets regularly to discuss big data and data science concepts, ideas, tools, methods, models and technologies used for analyzing and processing large scale data, extracting meaning and gaining insight into data.
The goal is to bring together people from various industries in Denmark who are interested in dealing with large amounts of data and offer interesting talks, hands-on advice and a forum for exchange and networking.
The intent of the community is not solely technical, but also strives to discuss, discover and communicate what kind of problems and business areas Big Data and Data Science can support, solve and improve.
Our members like to discuss topics like: Big data analytics, Hadoop, Pig, data mining, machine learning, Mahout, predictive analytics, Spark, neural networks, heuristics, Storm, statistical computing, R language, Python, mass text mining, data science opportunities & challenges
Making sense of data is first step towards extracting value from your data and many times more important than being able to scale the processing to large amounts. Thus, it is vital to have the right skills and tools for understanding your data and doing exploratory analysis. This talk will have a look at the hidden potential of big data analysis and how data science relates to big-data. We will walk through the processes of data analysis and show practical examples of using data analysis in Denmark.
Some decisions cannot wait days, hours or even minutes to be answered, instead the analysis needs to happen as soon at data arrives. Many business areas benefit from being real-time instead of delayed, including financial trading, health-care monitoring, fraud detection, monitoring of electrical grids, windmills, industrial machinery and much more. This talk will look at processing and analyzing large volumes of data in near real-time. Hence what kind of problems, complexity and solutions you run into when doing things in real-time, and finally the technologies that exists for doing big-data in real time.
Big web companies Google, Amazon and Yahoo pioneered Hadoop and similar technologies over a decade ago building their value offerings on inexpensive, resilient distributed data processing. While the data processing capabilities of these companies are impressive, most businesses operate on a much smaller volume of data. What benefits can companies with these more modest data sets harvest from Hadoop? This talk will take a deep dive into the Hadoop platform and illustrate several real-life examples of how this technology can benefit businesses in general.
Meetup in cooperation with Rhus.club, the new R user group in Aarhus
We R happy to announce the first event in the Rhus.club! We will have a visit by Sebastian Brandes, Tech Evangelist @ Microsoft!
Description: Executing R jobs in the cloud is the new ”hot drug”, and Microsoft is your pusher! In the last few years Microsoft has worked intensively on developing new services in the cloud like HDInsight and Azure Machine Learning, and R is fully supported. In this session, Sebastian Brandes will describe how to take advantage of Azure and R and show us how to do it in practice. Expect a demo intensive presentation with many tips and tricks to get started with Azure!
Have you ever thought about, how Spotify generates suggestions for new songs you should listen to? Or, how YouTube can generate automatic playlists? Or, how Stack Overflow comes up with suggestions, for new forum threads, that might be of interest to you? The answer is Recommendation Engines, and in this session we will talk about, how you can build your own engine using HDInsight/Hadoop in Azure.
The session is for everyone – regardless of prior experience, or lack thereof with the subject. The main focus will be on how Recommandation Engines (and Machine Learning in the cloud in general) work, then there will be a short explanation of the mathematics behind, and lastly we will cover how to get started training one’s own engine.
Sebastian Brandes is a Tech Evangelist at Microsoft´s Developer Experience-department. He works with Apps, Azure and developer tools such as e.g. Visual Studio and has worked with the treatment and processing of large data volumes in the cloud since 2011. He works closely with Microsoft´s internal Center of Excellence for Data Insights and has held several internal and external presentations on the subject of data processing and statistics using the support of a wide range of Microsoft-products.
Keeping up with modern trends in Big Data, Comiit is sending representatives to the Strata Conference in London. The conference is one of the largest of its kind in the world, and is a three-day safari in the field of Big Data. Some of the world’s leading speakers on Big Data will be attending the conference, so if you want to learn what the shakers and movers are up to then this is the conference for you.
You can read more about the conference here: http://strataconf.com/big-data-conference-uk-2015
To be clear: This conference is pay to participate, and we are simply participants. However we wanted to see how many of our colleagues will be joining us this year. Therefore, we started this meetup so that we could find out beforehand who will be joining us in London.
If any of our colleagues want to join us later in the evening to discuss the presentations of the day over a cold beer, then that is simply an added bonus.
The extensive list of over 100 speakers for the conference can be seen here:
kdb+ is a high-performance, high volume database designed from the outset in anticipation of vast increases in data volumes. The database incorporates its own powerful query language, q, so that analytics can be run directly on the data, supporting real-time analysis of billions of records and fast access to terabytes of historical data. The main focus will be exploring the key features of kdb+, how it compares to other technologies and some practical use cases. Kdb+ is widely used in finance sector as market data servers, back-ends for trading applications and investment funds and many other applications.
Krishan Subherwal is based in the First Derivatives headquarters in Newry. Krishan is a kdb+ developer part of the Quantitative and Derivative Strategies Group within a major US investment bank with 2 years experience. The team is tasked with developing an equity derivatives analytics portal, to be used globally by clients to provide visualization and analysis of equity volatility data and derivative trading strategies.
Apache Hive has become a great way for people to start using Hadoop, as it it features a SQL like query language. But Hive is not a regular database, and Hive’s query language is not SQL. In this talk, I will give you a tour of Hive, and try to focus in on the areas where Hive is different, and where you might get a few surprises when you start using Hive in practice. The talk will include some live demos, and some of the topics that will be touched upon include Tez vs. MR, the ORC format, window functions and transforms.
Martin Qvist is an HPC Specialist at Vestas Wind Systems. He studied mathematics at Aalborg University, and worked on traffic modeling in computer networks as well as some approximation theoretic topics in subdivision surfaces. Since then he has worked as a software developer, and subsequently started his own consultancy company, which he ran until he started at Vestas.
As always the presentation is free and refreshments will be provided for all who sign up.
There are max. 40 seat available for this presentation.
Are you interested in applying machine learning to current research and industry problems?
Then join us in our second BigDataDenmak meetup in Copenhagen, where we will togehter with Data Scientist Peter Sergio Larsen take a deep dive into Machine Learning with Apache Spark.
We will present and discuss use cases where Peter and his team applied Data Discovery techniques and training of Machine Learning models using Spark and Python libraries.
Peter Sergio Larsen is the Chief Data Scientist at Visma Consulting. His experience ranges from software development to the implementation of advanced machine learning models to business understanding and the preparation of business cases.
In this highly technical talk you can expect us exploring the following:
This talk features many interesting and audience-interactive demos - as well as code-level deep dives into many of the projects listed above.
All demo code is available on Github at the following link: https://github.com/fluxcapacitor/pipeline/wiki
In addition, the entire demo environment has been Dockerized and made available for download on Docker Hub at the following link: https://hub.docker.com/r/fluxcapacitor/pipeline/
Chris Fregly is a Principal Data Solutions Engineer for the newly-formed IBM Spark Technology Center, an Apache Spark Contributor, a Netflix Open Source Committer, as well as the Organizer of the global Advanced Apache Spark Meetup and Author of the Upcoming Book, Advanced Spark.
Today, when storage has become cheap in multi-tenant cloud environments, it is possible to load and store ever-increasing data. This has initiated the generation and collection of IoT data, like sensors’ measurements, for analysis and forecasting.
Collecting and running forecasts on the measurements from the billions of sensors in PB at near real time, i.e. during the seconds from the actual data generation, poses different architectural consideration on the system design - how to organize data in storage layer and how to implement the machine learning algorithm to do forecast, at the same time supporting data ingest at high rate and high concurrency data access.
In this talk Helen will present a case study of a utility company which collects the measurements from water and heat controllers installed in a wide base of households. The company collects the measurements to plan for peak usage time, to detect leakage and to predict maintenance costs. In this case study the open source Hadoop ecosystem is chosen, which runs in the commercial multi-tenant Azure cloud. The case study covers data organization on Hadoop NoSQL database Hbase to support near real-time forecasting and to serve access to a wide base of public consumers.