What is Data ?
Data is a raw, unorganized set o things that need to be processed to have a meaning and Raw data is like Raw intelligence, Useless.
What is Information ?
Information is when data is processed, organized, structured or presented in a given context so as to make it useful
What is information Science ?
“Information Science” seems to be more appropriate term, but it’s too far to go back.
What is data Science:
“Data Science is when you are dealing with Big Data, large ammounts of data”. but That is not true, Data Science can be applied to a data set with one thousand lines, there is no problem with this
The creation of Data Science in simple words: two sides that were not totally connected, but with the new fast paced and technological world would have to merge together
- Statistics/mathematics: formulate proper models to generate insights;
- Computer science: make the bridge between the models and the data in a feasible time to come with the result;
- Only two sides because Machine Learning is all based on math and stats;
- Theoretical computer science could be considered a branch of mathematics;
guys you know well this topics you are a Data Scientist.
- Linear algebra
- Non-linear systems, dynamic systems
- Analytical geometry
- Optimization
- Calculus
- Statistics and probability
- Programming language (R, Python, SAS, Javascript)
- Softwares: Excel, IBM SPSS, SAS Enterprise Miner
- General DS & MLasS platforms:
- IBM Watson Studio & Analytics
- Azure Machine Learning,
- Google Cloud Machine Learning,
- H2O
- Big ML
- Rapidminer and KMINE
- Amazon SageMaker
- Data visualizations: Power BI, Tableau, R/Python using plotly/ggplot/highcharts
- Machine Learning (supervised, unsupervised and reinforcement learning)
- Big Data (MapR, RedShift, Snowflake, Big Query, Cassandra, Hadoop, Spark)
- Hardware (CPU, GPU, TPU, FPGA, ASIC)
Hey you will a data scientist?
Some helpful resources
As I worked on projects, I found these resources helpful. Remember, resources on their own aren't useful -- find a context for them:
- Khan Academy -- good basic statistics and linear algebra content.
- Introduction to Linear Algebra, 4th Edition -- Great linear algebra book by Gilbert Strang.
- Textbook | Calculus Online Textbook | MIT OpenCourseWare -- also by Gilbert Strang, great calculus book.
- data mining, inference, and prediction. 2nd Edition -- Elements of statistical learning, a good machine learning book.
- Andrew Ng’s Online Machine Learning Class -- the original coursera class.
- OpenIntro Statistics -- Good basic stats book.
- https://scholar.google.com -- A paper can be a great way to learn about a topic. For example, here's Breiman's original random forest paper: http://link.springer.com/article... .