2008-04-20 00:01:27.726517,5,11111 2008-04-20 00:02:10.170999,4,11011 2008-04-20 00:03:01.856159,4,11011 2008-04-20 00:03:33.333176,4,10111 . . .and create a data frame with columns, year, day, month etc.
Solution: First, using readLines(), we load in the file into an array of character strings, where each line one element in the vector.
Now we divide and conquer. We create a function, makerow() which converts one line---one character string of the form "2008-04-20 00:01:27.726517,5,11111" into a vector of numbers 2008 4 20 0 1 27.726517 5 11111.> lines <- readLines("timestampdata.txt") > lines [1] "2008-04-20 00:01:27.726517,5,11111" "2008-04-20 00:02:10.170999,4,11011" [3] "2008-04-20 00:03:01.856159,4,11011" "2008-04-20 00:03:33.333176,4,10111" [5] "2008-04-20 00:04:06.451844,4,11011" ...
We use strsplit() to break up the string into a vector of substrings
on each occurence of a delimiter "-"," ", ":" or ",".
The regular expression "[-: ,]" means any of the set "-"," ", ":" or ",".
The character strings are converted to numeric data type by sapply-ing the
function as.numeric() to the character string.
Then that result is coerced back into a simple vector.
Now its just a matter of applying this function to every element in the vector "lines" that we already read in. For example, this produces a matrix close to what we want> makerow <- function(x) as.vector(sapply(strsplit(x,"[-: ,]"),as.numeric))
(sapply has habit of giving us the transpose of what we want, so we just reverse it using transpose t(). Also, USE.NAMES=F prevents sapply from creating unnecessary and ugly names.)> t(sapply(lines,makerow,USE.NAMES=F)) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 2008 4 20 0 1 27.726517 5 11111 [2,] 2008 4 20 0 2 10.170999 4 11011 [3,] 2008 4 20 0 3 1.856159 4 11011 [4,] 2008 4 20 0 3 33.333176 4 10111 [5,] 2008 4 20 0 4 6.451844 4 11011 ...
We're almost there - all we need is to make this a data frame name the columns:
and voila, we're ready to start analyzing...> d <- data.frame( t(sapply(lines,makerow,USE.NAMES=F)) ) > names(d) <- c("year","month","day","hour","min","sec","nfired","which") > d year month day hour min sec nfired which 1 2008 4 20 0 1 27.726517 5 11111 2 2008 4 20 0 2 10.170999 4 11011 3 2008 4 20 0 3 1.856159 4 11011 4 2008 4 20 0 3 33.333176 4 10111 5 2008 4 20 0 4 6.451844 4 11011 ...