Friday, 27 September 2013

R rownames(foo[bar]) prints as null but can be successfully changed - why?

R rownames(foo[bar]) prints as null but can be successfully changed - why?

I've written a script that works on a set gene-expression data. I'll try
to separate my post in the short question and the rather lengthy
explanation (sorry about that long text block). I hope the short question
makes sense in itself. The long explanation is simply to clarify if I
don't get the point along in the short question.
I tried to aquire basic R skills and something that puzzles me occurred,
and I didn't find any enlightment via google. I really don't understand
this. I hope that by clarifying what is happening here I can better
understand R. That said I'm not a programmer so please bear with my bad
code.
SHORT QUESTION:
When I have rownames(foo) e.g.
> print(rownames(foo))
"a" "b" "c" "d"
and I try to access it via print(rownames(foo[bar]) it prints it as null. E.g
> print(rownames(foo[2]))
NULL
Here in the second answer Richie Cotton explains this as "[...] that where
there aren't any names, [...]" This would indicate to me, that either
rownames(foo) is empty - which is clearly not the case as I can print it
with "print(rownames(foo))" - or that this method of access fails.
However when I try to change the value at position bar, i get a warning
message, that the replacement length wouldn't match. However the operation
nevertheless succeeds - which pretty much proves, that this method of
access is indeed successful. E.g.
> bar = 2
> rownames(foo[bar]) = some.vector(rab)
> print(rownames(foo[bar])
NULL
> print(rownames(foo))
"a" "something else" "c" "d"
Why is this working? Obviously the function can't properly access the
position of bar in foo, as it prints it as empty. Why the heck does it
still replace the value successfully and not fail in a horrific way? Or
asked the other way around: When it successfully replaces the value at
this position why is the print function not returning the value properly?
LONG BACKGROUND EXPLANATION:
The data source contains the number in the list, the entrez-id of the
gene, the official gene symbol, the affimetrix probe id and then the
increase or decrease values. It looks something like this:
No Entrez Symbol Probe_id Sample1_FoldChange Sample2_FoldChange
1 690244 Sumo2 1367452_at 1.02 0.19
Later when displaying the data I want it to print out only the gene symbol
and the increases. Now if there is no gene-symbol in the data set it is
printed as "n/a", this is obviously of no value for me, as I can't
determine which one of many genes it is. So I made a first processing
step, that only for this cases exchanges the "n/a" result with "n/a(12345)
where 12345 is the entrez-id.
I've written the following script to do this. (Note as I'm not a
programmer and I am new with R I doubt that it is pretty code. But that's
not the point I want to discuss.)
no.symbol.idx <-which(rownames(expr.table) == "n/a")
c1 <- character (length(rownames(expr.table)))
c2 <- c1
for (x in 1:length(c1))
{
c1[x] <- "n/a ("
}
for (x in 1:length(c2))
{
c2[x] <- ")"
}
rownames(expr.table)[no.symbol.idx] <- paste(c1, (expr.table[no.symbol.idx
, "Entrez"]),c2, sep="")
The script works and it does what it should do. However I get the
following error message.
Warning message:
In rownames(expr.table)[no.symbol.idx] <- paste(c1,
(expr.table[no.symbol.idx, :
number of items to replace is not a multiple of replacement length
To find out what happened here is i put some text output into the script.
no.symbol.idx <-which(rownames(expr.table) == "n/a")
c1 <- character (length(rownames(expr.table)))
c2 <- c1
for (x in 1:length(c1))
{
c1[x] <- "n/a ("
}
for (x in 1:length(c2))
{
c2[x] <- ")"
}
print("print(rownames(expr.table)):")
print(rownames(expr.table))
print("print(no.symbol.idx):")
print(no.symbol.idx)
print("print(rownames(expr.table[no.symbol.idx])):")
print(rownames(expr.table[no.symbol.idx]))
print("print(rownames(expr.table[14])):")
print(rownames(expr.table[14]))
print("print(rownames(expr.table[15])):")
print(rownames(expr.table[15]))
cat("print(expr.table[no.symbol.idx,\"Entrez\"]):\n")
print(expr.table[no.symbol.idx,"Entrez"])
rownames(expr.table)[no.symbol.idx] <- paste(c1, (expr.table[no.symbol.idx
, "Entrez"]),c2, sep="")
print("print(rownames(expr.table)):")
print(rownames(expr.table))
print("print(rownames(expr.table[no.symbol.idx])):")
print(rownames(expr.table[no.symbol.idx]))
And I get the following output in the console.
[1] "print(rownames(expr.table)):"
[1] "Sumo2" "Cdc37" "Copb2" "Vcp" "Ube2d3" "Becn1" "Lypla2" "Arf1"
"Gdi2" "Copb1" "Capns1" "Phb2" "Puf60" "Dad1" "n/a"
[1] "print(no.symbol.idx):"
[1] 15
[1] "print(rownames(expr.table[no.symbol.idx])):"
NULL
[1] "print(rownames(expr.table[14])):"
NULL
[1] "print(rownames(expr.table[15])):"
NULL
... (to be continued) so obviously no.symbol.idx gets the right position
for the n/a value. When I try to print it however it claims that rownames
for this position was empty and returns NULL. When I try to access this
position "by hand" and use expr.table[15] it also returns NULL. This
however has nothing to do with the n/a value as the same holds true for
the value stored at position 14.
... (the continuation)
print(expr.table[no.symbol.idx,"Entrez"]):
[1] "116727"
[1] "print(rownames(expr.table)):"
[1] "Sumo2" "Cdc37" "Copb2" "Vcp" "Ube2d3"
"Becn1" "Lypla2" "Arf1" "Gdi2"
[10] "Copb1" "Capns1" "Phb2" "Puf60" "Dad1"
"n/a (116727)"
[1] "print(rownames(expr.table[no.symbol.idx])):"
NULL
and this is the result that surprises me. Despite this it is working. It
claims everything would be NULL but the operation is successful. I don't
understand this.

No comments:

Post a Comment