2. R possesses an extensive catalog of statistical and graphical methods. It includes machine learning algorithms, linear regression, time series, and statistical inference to name a few.
3. Most of the R libraries are written in R, but for heavy computational tasks, C, C++ and FORTRAN codes are preferred.
4. R is not only entrusted by academic, but many large companies also use R programming language, including Uber, Google, Airbnb, Facebook and so on.
5. Data analysis with R is done in a series of steps; programming, transforming, discovering, modeling and communicate the results
6. Program: R is a clear and accessible programming tool
7. Transform: R is made up of a collection of libraries designed specifically for data science
8. Discover: Investigate the data, refine your hypothesis and analyze them
9. Model: R provides a wide array of tools to capture the right model for your data
10. Communicate: Integrate codes, graphs, and outputs to a report with R Markdown or build Shiny apps to share with the world
Q.2. Explain the Basic Features of R.Answer:R programming language is filled with such exciting and amazing featuresThere are many things R can do for data scientists and analysts. These key features are what set R apart from the crowd of statistical languages: 1. Open-source:a) R is an open-source software environment. It is free of cost and can be adjusted and adapted according to the user’s and the project’s requirements.
b) You can make improvements and add packages for additional functionalities.
c) R is freely available. You can learn how to install R, Download and start practicing it.
2. Strong Graphical Capabilitiesa) R can produce static graphics with production quality visualizations and has extended libraries providing interactive graphic capabilities.
b) This makes data visualization and data representation very easy.
c) From concise charts to elaborate and interactive flow diagrams, all are well within R’s repertoire. Look at the attractive graphical visualizations in R.
Fig 1: Data Visualization in R3. Highly Active Communitya) R has an open-source library which is supported by its growing number of users.
b) The R environment is continuously growing. This growth is due to its large user-base.
4. A Wide Selection of Packagesa) CRAN or Comprehensive R Archive Network houses more than 10,000 different packages and extensions that help solve all sorts of problems in data science.
b) High-quality interactive graphics, web application development, quantitative analysis or machine learning procedures, there is a package for every scenario available.
c) R contains a sea of packages for all the forms of disciplines like astronomy, biology, etc. While R was originally used for academic purposes, it is now being used in industries as well.
5. Comprehensive Environmenta) R has a very comprehensive development environment meaning it helps in statistical computing as well as software development.
b) R is an object-oriented programming language. It also has a robust package called Rshiny which can be used to produce full-fledged web apps.
c) Combined with data analysis and data visualization, R can be used for highly interactive online data-driven storytelling.
6. Can Perform Complex Statistical Calculationsa) R can be used to perform simple and complex mathematical and statistical calculations on data objects of a wide variety.
b) It can also perform such operations on large data sets.
7. Distributed Computinga) In distributed computing, tasks are split between multiple processing nodes to reduce processing time and increase efficiency.
b) R has packages like ddR and multiDplyr that enable it to use distributed computing to process large data sets
8. Running Code without a Compilera) R is an interpreted language which means that it does not need a compiler to make a program from the code.
b) R directly interprets provided code into lower-level calls and pre-compiled code
9. Interfacing with DatabasesR contains several packages that enable it to interact with databases like Roracle, Open Database Connectivity Protocol, RmySQL, etc. 10. Data VarietyR can handle a variety of structured and unstructured data. It also provides various data modeling and data operation facilities due to its interaction with databases. 11. Machine LearningR can be used for machine learning as well. The best use of R when it comes to machine learning is in case of exploration or when building one-off models. 12. Data Wranglinga) Data wrangling is the process of cleaning complex and inconsistent data sets to enable convenient computation and further analysis. This is a very time taking process.
b) R with its extensive library of tools can be used for database manipulation and wrangling.
13. Cross-platform SupportR is machine-independent. It supports the cross-platform operation. Therefore, it can be used on many different operating systems. 14. Compatible with Other Programming LanguagesWhile most of its functions are written in R itself, C, C++ or FORTRAN can be used for computationally heavy tasks. Java, .NET, Python, C, C++, and FORTRAN can also be used to manipulate objects directly. 15. Data Handling and StorageR is integrated with all the formats of data storage due to which data handling becomes easy. 16. Vector Arithmetica) Vectors are the most basic data structure in R, and most other data structures are derived from vectors.
b) R uses vectors and vector arithmetic and does not need a lot of looping to process a large set of values. This makes R much more efficient.
17. Compatibility with Other Data Processing Technologiesa) R can be easily paired with other data processing and distributed computing technologies like Hadoop and Spark. It is possible to remotely use a Spark cluster to process large datasets using R.
b) R and Hadoop can be paired as well to combine Hadoop’s large scale data processing and distributing computing capabilities with R’s statistical computing power.
18. Generates Report in any Desired Formata) R’s markdown package is the only report generation package you will ever need when working with R. The markdown package can help produce web pages.
b) It can also generate reports in the form of word documents or PowerPoint presentations. All with your R code and results embedded into them
Q.3.What are the unique features of R programming?Answer:Due to a large number of packages available, there are many other handy features as well:2. R can pull data from APIs, servers, SPSS files, and many other formats.
3. R is useful for web scraping.
4. It can perform multiple complex mathematical operations with a single command.
5. Using R Markdown, it can create attractive reports that combine plain text with code and visualizations of the results.
6. Due to a large number of researchers and statisticians using it, new ideas and technologies often appear in the R community first.
Q.4. Explain the types of R atomic vector.Answer:There are four common types of R atomic vectors:1. Numeric Data TypeDecimal values are referred to as numeric data types in R. If we assign a decimal value for any variable g, as given below then, g will become a numeric type. 2. Integer Data TypeA numeric value with no fraction called integer data is represented by “Int”. -54 and 23 are two of the examples of an integer. Int size is 2 bytes while long Int size is 4 byte.In order to assign an integer to a variable, there are two ways:a) The first way is to use the as.integer() function
b) The second way is the appending of L to the value
3. Character Data TypeThe character is held as the one-byte integer in memory. There are two ways to create a character data type value in R:a) The first method is by typing a string between ” “
b) In order to convert a number into character, make use of as.character() function
4. Logical Data TypeA logical data type returns either of the two values – TRUE or FALSE based on which condition is satisfied. Q.5. Explain 1. Windows installation of R2. Linux Installation of R Answer: 1. Windows installation of R: a) You can download the Windows installer version of R from R-3.2.2 for Windows (32/64 bit) and save it in a local directory. b) As it is a Windows installer (.exe) with a name "R-version-win.exe". You can just double click and run the installer accepting the default settings. c) If your Windows is 32-bit version, it installs the 32-bit version. But if your windows is 64-bit, then it installs both the 32-bit and 64-bit versions. d) After installation you can locate the icon to run the Program in a directory structure "R\R3.2.2\bin\i386\Rgui.exe" under the Windows Program Files. Clicking this icon brings up the R-GUI which is the R console to do R Programming. 2. Linux Installation of R: a) R is available as a binary for many versions of Linux at the location R Binaries. b) The instruction to install Linux varies from flavor to flavor. These steps are mentioned under each type of Linux version in the mentioned link. However, if you are in a hurry, then you can use yum command to install R as follows −c) $ yum install R d) Above command will install core functionality of R programming along with standard packages, still you need additional package, then you can launch R prompt e) Now you can use install command at R prompt to install the required package. Q.6.What are the Applications of subsetting Data.Answer:1. Duplicate data can be removed during analysis using duplicated () function in R 2. Below command shows how to find duplicate data in subsets: Duplicated () function finds duplicate values and returns a logical vector that tells you whether the specific value is a duplicate of a previous value. 3. For all those values which are duplicate in the sample, true is returned. 4. Missing data can be identified using complete. Cases () function in R 5. complete. Cases () command in R is used to find rows which are complete. It gives logical vector with the value TRUE for rows that are complete, and FALSE for rows that have some NA values. 6. Rows which have NA values can be removed using na. omit () function as below:row_name <- na.omit(file_name) Q.7. Explain Basic GUI of RAnswer:2. RGui gives you some tools to manage your R environment — most important, a console window.
3. The console is where you type instructions, or scripts, and generally get R to do useful things for you.
4. The standard installation process creates useful menu shortcuts (although this may not be true if you use Linux, because there is no standard RGui editor for Linux).
5. In the menu system, look for a folder called R, and then find an icon called R followed by a version number.
6. When you open RGui for the first time, you see the R Console screen, which lists some basic information such as your version of R and the licensing conditions. 7. Below all this information is the R prompt, denoted by a > symbol. The prompt indicates where you type your commands to R; you see a blinking cursor to the right of the prompt. 8. Use the console to issue a very simple command to R. R responds immediately to your command 9. One of the clever things about R is that it can deal with calculating many values at the same time, which is called vector operations. You need to know is that R can handle more than one value at a time. 10. To quit your R session, type the following code in the console, after the command prompt (>) 11. R asks you a question to make sure that you meant to quit,. Click No, because you have nothing to save. This action closes your R session (as well as RGui, if you’ve been using RGui as your code editor). Q.8. How can you access elements of R vectors?Answer:With the help of vector indexing, we can access the elements of vectors. Indexing denotes the position where the values in a vector are stored. This indexing can be performed with the help of integer, character or logic.1. Indexing with Integer Vector
Unlike many programming languages like Python, C++, Java etc. where the indexing starts from 0, the indexing of vectors in R starts with 1.We can perform indexing by specifying integer value in square braces [ ] next to our vector.2. Indexing with Character Vector
Character vector indexing can be done3. Indexing with Logic Vector
In logical indexing, the positions whose corresponding position has logical vector TRUE are returned. For example, in the below code, R returns the positions of 1 and 3, where the corresponding logical vectors are TRUE. Q.9. Give some of the operation of R vectorsAnswer:1. Combining Vector in R2. Arithmetic Operations on Vectors in R3. Logical Index Vector in R4. Numeric Index5. Duplicate Index6. Range Indexes7. Out-of-order Indexes8. Named Vectors Members Q.10. Write the functions for “Reading data in R”.Answer:There are a few very useful functions for reading data into R.file).
1. Vectors in R
a) These are ordered container of primitive elements and are used for 1-dimensional data.
b) Types – integer, numeric, logical, character, complex
2. Matrices in R
a) These are Rectangular collections of elements and are useful when all data is of a single class that is numeric or characters.
b) Dimensions – two, three, etc
.3. C. Lists in R
a) These are ordered container for arbitrary elements and are used for higher dimension data, like customer data information of an organization.
b) When data cannot be represented as an array or a data frame, list is the best choice. This is so because lists can contain all kinds of other objects, including other lists or data frames, and in that sense, they are very flexible.
4. D. Data frames
These are two-dimensional containers for records and variables and are used for representing data from spreadsheets etc. It is similar to a single table in the database. Q.13. Explain different types of merge in RAnswer:The merge() function allows four ways of combining data: 1. Natural join in RTo keep only rows that match from the data frames, specify the argument all=FALSE 2. Full outer join in RTo keep all rows from both data frames, specify all=TRUE 3. Left outer join in RTo include all the rows of your data frame x and only those from y that match, specify all.x=TRUE 4. Right outer join in RTo include all the rows of your data frame y and only those from x that match, specify all.y=TRUE Q.14.Give some operators used in RAnswer:Some of the frequently used operators in R are:Operator | Example | Meaning |
~ | y ~ x | Model y as a function of x |
+ | y ~ a + b | Include columns a as well as b |
– | y ~ a – b | Include a but exclude b |
: | y ~ a : b | Estimate the interaction of a and b |
* | y ~ a * b | Include columns as well as their interaction |
| | y ~ a | b | Estimate y as a function of a conditional on b |