Recovering a folder on an LVM ext3 partition

M

microx

I just had a sudden break down of my hard disk running Linux, don't
know what the reason is, but the result is that all the partitions in
an LVM group are now seriously corrupt. The HD layout looks something
like this:

hda1 - primary Windows partition
hda2 - primary ext3 for /boot
hda3 - LVM containing 4 logical ext3 partitions: / (root), /home,
/usr/local, swap

It's strange that LVM itself reports no errors, but the logical
partitions are all screwed, while the 2 partitions not inside LVM are
untouched. Anyone have a clue why that is?

Anyway, obviously the partition I care most about is /home. I booted
from a LiveCD and I tried doing fsck on the root first as a trial, and
ended up with a working, but empty, partition. :) So I proceeded with
more caution on the /home partition, I answered "no" to any questions
about "force rewrite", but I let it move things to lost+found. At the
end, I attempted to mount the partition, it worked, but one top-level
folder which was called "data" is gone. At one point during fsck, I
noticed it asking about clearing this folder's extra inodes or somthing
like that. This folder contained tons of photo files, as well as some
other media and documents. There are lots of files in the lost+found,
and some folders, so I assumed the "data" folder just might be one of
those folders, however corrupted. However, if I try to cd into any of
them, I just get thrown back to the home folder (the home of the
LiveCD, not the one I'm recovering). Is there any way I can try to
recover these folders?
 
A

Arno Wagner

Previously microx said:
I just had a sudden break down of my hard disk running Linux, don't
know what the reason is, but the result is that all the partitions in
an LVM group are now seriously corrupt. The HD layout looks something
like this:
hda1 - primary Windows partition
hda2 - primary ext3 for /boot
hda3 - LVM containing 4 logical ext3 partitions: / (root), /home,
/usr/local, swap
It's strange that LVM itself reports no errors, but the logical
partitions are all screwed, while the 2 partitions not inside LVM are
untouched. Anyone have a clue why that is?

Can you be a bit more specific what "screwed" means? Also:
What kernel version, what were you doing when the corruption
occured?

Is it possible that yopu acctdentially overwrote parts of hda3? The
LVM descriptor block is at the end of the volume, so it would be
the last thing to go.
Anyway, obviously the partition I care most about is /home. I booted
from a LiveCD and I tried doing fsck on the root first as a trial, and
ended up with a working, but empty, partition. :) So I proceeded with
more caution on the /home partition, I answered "no" to any questions
about "force rewrite", but I let it move things to lost+found. At the
end, I attempted to mount the partition, it worked, but one top-level
folder which was called "data" is gone. At one point during fsck, I
noticed it asking about clearing this folder's extra inodes or somthing
like that. This folder contained tons of photo files, as well as some
other media and documents. There are lots of files in the lost+found,
and some folders, so I assumed the "data" folder just might be one of
those folders, however corrupted. However, if I try to cd into any of
them, I just get thrown back to the home folder (the home of the
LiveCD, not the one I'm recovering). Is there any way I can try to
recover these folders?

Hmm. Since you worked on the original partitions without backup, there
is a good possibility that you destroyed additional data. Also
there will be only files in lost+found, not directories. The
behaviour you observe is correct (depending on shell settings).

Advice: Stop writing to the disk immediuately. Make a sector image anf
work on that. Then start looking for your files in lost+found. Other
steps depend on what caused the corruption, but stop writing on the
original immediately!

I am also curious what prompted the use of LVM for this setup. Usually
you would just do a conventional extended partition, using LVM does not
really make sense.

Additional advice: It is Linux and pretty corruption resistent, but
backup is non-optional for important files, no matter what.

Arno
 
A

Arno Wagner

Previously microx said:
I just had a sudden break down of my hard disk running Linux, don't
know what the reason is, but the result is that all the partitions in
an LVM group are now seriously corrupt. The HD layout looks something
like this:
hda1 - primary Windows partition
hda2 - primary ext3 for /boot
hda3 - LVM containing 4 logical ext3 partitions: / (root), /home,
/usr/local, swap
It's strange that LVM itself reports no errors, but the logical
partitions are all screwed, while the 2 partitions not inside LVM are
untouched. Anyone have a clue why that is?

P.S.: LVM does not know what is in the partitions. You can overwrite
the whole LVM volume except the last few blocks with zeros or random
data and it will still look fine to LVM. That behaviour is expected
and correct.

As to the reasons, since your root parttition seems not to have any
data left and since directories and files are ditributed all overt
the disk and backup superblocks are as well, you very likely
have overwritten your the beginning of hda3 somehow. If
/usr/local is intact, thet would be further evidence.

Arno
 
S

Svend Olaf Mikkelsen

I just had a sudden break down of my hard disk running Linux, don't
know what the reason is, but the result is that all the partitions in
an LVM group are now seriously corrupt. The HD layout looks something
like this:

hda1 - primary Windows partition
hda2 - primary ext3 for /boot
hda3 - LVM containing 4 logical ext3 partitions: / (root), /home,
/usr/local, swap

It's strange that LVM itself reports no errors, but the logical
partitions are all screwed, while the 2 partitions not inside LVM are
untouched. Anyone have a clue why that is?

Anyway, obviously the partition I care most about is /home. I booted
from a LiveCD and I tried doing fsck on the root first as a trial, and
ended up with a working, but empty, partition. :) So I proceeded with
more caution on the /home partition, I answered "no" to any questions
about "force rewrite", but I let it move things to lost+found. At the
end, I attempted to mount the partition, it worked, but one top-level
folder which was called "data" is gone. At one point during fsck, I
noticed it asking about clearing this folder's extra inodes or somthing
like that. This folder contained tons of photo files, as well as some
other media and documents. There are lots of files in the lost+found,
and some folders, so I assumed the "data" folder just might be one of
those folders, however corrupted. However, if I try to cd into any of
them, I just get thrown back to the home folder (the home of the
LiveCD, not the one I'm recovering). Is there any way I can try to
recover these folders?

The fsck programs should not be used when data are lost. Anyway, you
could try ext3 data recovery programs. There is a free Windows program
Readext2 at

http://www.partitionsupport.com/utilities.htm

To use the program you need to know the 0 based cylinder number of a
superblock from the partition. If that as example is 6000, and the
disk is the first disk, the Readext2 command to examine the partition
is:

readext2 1 6000 6000

Or to search a cylinder area for a superblock to use (example):

readext2 1 6000 7000
 
M

microx

Can you be a bit more specific what "screwed" means? Also:
What kernel version, what were you doing when the corruption
occured?

I can try to be more specific. :) Basically, it won't mount saying it
has a "wrong filesystem" and after fsck, I find I've lost some files. I
wasn't doing anything special at the time, it's quite strange. I had
just clicked a button to start Firefox, and while it was opening, I
deleted 1 file that I had on the desktop and no longer needed. That's
where it all started when it gave me an error saying that I'm trying to
delete a file on a read only disk, so I thought to try logging out to
see if /home was remounted read-only for some reason, but then X
wouldn't start up again. So I tried restarting the computer, but then
it gave an error during boot-up that it can't mount the root
filesystem, so then I booted with the LiveCD to see what's going on. I
am running FC5 with a 2.6 kernel.
I am also curious what prompted the use of LVM for this setup. Usually
you would just do a conventional extended partition, using LVM does not
really make sense.

I am curious now too. :) No particular reason, it's my personal
laptop, and I've just been seeing the LVM option during the Fedora
setup for a while, so I thought why not try it out to give me more
flexibility in resizing partitions if I needed. Learned my lesson now.
:)

I do back up every few months, my main system is at home with a CVS
repository which I backup every month or so. But I'm on a business trip
for 2 months now, so what's not backed up is just that period, but that
still worries me because there's a lot of new email in the Thunderbird
folders and Gaim chat logs, and all the gigabytes of photos I took!
Also there will be only files in lost+found, not directories. The
behaviour you observe is correct (depending on shell settings)

Yeah, I get it. So does that mean I might be able to find individual
files from inside the lost folder directly in the lost+found? And is
there any way I can make use of these "recovered folders"?
P.S.: LVM does not know what is in the partitions. You can overwrite
the whole LVM volume except the last few blocks with zeros or random
data and it will still look fine to LVM.

I do understand that, what's wierd to me is just why it happened to all
the LVM partitions and not the others, with the LVM tools not noticing
anything wrong. If it was a head crash, I'd expect to see everything
messed up, and if it was a filesystem, I'd expect to see it on just one
partitions. But it could be, as you mentioned, that it overwrote the
start of the hda3 partition for some reason, and which spans clusters
from several of the logical volumes somehow. Why I'm asking is because
I'm not sure whether to assume that the partitions are still in the
right place or not, so that I would know whether to try playing around
with the paritioning or whether to just go ahead and try to recover
files. But I do assume it's the latter, it's just a thought. Anyway,
I'm avoiding LVM from now on. :)
The fsck programs should not be used when data are lost. Anyway, you
could try ext3 data recovery programs.

Yeah, I'll keep the partition read-only now and try to use some data
recovery software now. I think what I'll do now is to try to copy out
any files I can currently see onto a separate disk, then I'll try the
Readext2 tool. I also read about another tool to use from Windows
called R-Linux, as well as PhotoRec which specifically tries to salvage
photos.

Thanks for the help guys. Any further tips will be greatly appreciated.
:)
 
M

Mike Tomlinson

Arno Wagner <[email protected]> said:
I am also curious what prompted the use of LVM for this setup. Usually
you would just do a conventional extended partition, using LVM does not
really make sense.

It seems to be the default in FC5 (which I note the OP is using.)
Doesn't make any sense to me either, especially since I installed FC5 on
a virgin disk and opted to create only the standard /boot, /, and swap
partitions. The installer then went and put the lot into a LVM volume
for reasons best known to itself.

My take is that it would be best to stick with standard partitioning
tools, since it's one less layer of obfuscation to work through when
things go wrong.

Similarly, I remove the LABEL= tags in /etc/fstab, replacing them with
the raw device names.

M.
 
M

microx

Mike said:
It seems to be the default in FC5 (which I note the OP is using.)
Doesn't make any sense to me either, especially since I installed FC5 on
a virgin disk and opted to create only the standard /boot, /, and swap
partitions. The installer then went and put the lot into a LVM volume
for reasons best known to itself.

Yeah, using LVM was just being fancy I guess. But I learned to keep at
least the /home partition separate anyway (LVM or not) since it makes
it clear where the important stuff is in a recovery situation.

It's also best to keep the /home partition closer to the end of the HD,
since that's the area least likely to be affected if the software goes
wild or if you accidentally format the HD. I learned this in my
previous HD crash (9 years ago, I have a history of HD corruption and
recovery). :) I had a completely dead drive back then, and I gave the
HD to a stupid repair workshop who managed to get the drive spinning
again, but then decided to repartition the whole thing as one partition
and format it! Luckily, my data partition was the 2nd half of the disk
and was not affected, but I had to re-write the partition table and
boot sector literally by hand byte-by-byte to get the data back. :)

Right now I'm copying out everything from the lost+found, and I'm happy
to be finding quite a bit of my data there already without using any
data recovery software. It seems I can also cd into the recovered
folders, I was just typing it wrong. The files and folders in
lost+found are named in the form #123456, so I was doing "cd #123456"
and that was throwing me back to the home. Then I realized I should be
doing "cd \#123456" since # is a special character in the shell, so I
can actually get into the folders and there is stuff in them, even with
their original filenames. :)

Anyway as I said, after recovering, I'll repartition using standard
types again and drop the LVM rubbish.
 
F

Folkert Rienstra

Because of how LVM works, maybe?
Can you be a bit more specific what "screwed" means? Also:
What kernel version, what were you doing when the corruption occured?

Is it possible that yopu acctdentially overwrote parts of hda3?
The LVM descriptor block is at the end of the volume, so it would be
the last thing to go.


Hmm. Since you worked on the original partitions without backup, there
is a good possibility that you destroyed additional data.
Also there will be only files in lost+found, not directories.
The behaviour you observe is correct
(depending on shell settings).

So actually may not be correct.....
Advice: Stop writing to the disk immediuately. Make a sector image anf
work on that. Then start looking for your files in lost+found. Other steps
depend on what caused the corruption, but stop writing on the original
immediately!

I am also curious what prompted the use of LVM for this setup.
Usually you would just do a conventional extended partition,
using LVM does not really make sense.

So when does making use of LVM make sense if "using LVM does not really make sense", babblebot.
Additional advice: It is Linux and pretty corruption resistent,

As the above example easily illustrates.
 
A

Arno Wagner

Previously microx said:
Can you be a bit more specific what "screwed" means? Also:
What kernel version, what were you doing when the corruption
occured?
[...]
filesystem, so then I booted with the LiveCD to see what's going on. I
am running FC5 with a 2.6 kernel.

My guess would be that you actually ran either into a rare bug in
LVM or that some other part of FC5 has a serious issue. I am inclined
to believe the later since filesystem or LVM bugs are very rare
today. Might be a good idea to go to a newer kernel as well.

Arno
 
F

Folkert Rienstra

Arno Wagner said:
Previously microx said:
Can you be a bit more specific what "screwed" means? Also:
What kernel version, what were you doing when the corruption occured?
[...]
filesystem, so then I booted with the LiveCD to see what's going on.
I am running FC5 with a 2.6 kernel.

My guess would be that you actually ran either into a rare bug in
LVM or that some other part of FC5 has a serious issue. I am inclined
to believe the later since filesystem or LVM bugs are very rare today.

So much for "It is Linux and pretty corruption resistent", babblemouth.
 
M

microx

Amazing how some people can come into a useful discussion, offer no
information whatsoever, and then call others a babblebot. :)

For the record, I managed to recover 100% of my data, just using fsck.

By the way, thanks to ext3, I was much more comfortable with the
incident happening in Linux than last time when it was under Windows.
Had I been using FAT, accidentally overwriting the first few sectors of
the partition (which is also much more likely to happen with a berserk
program in Windows) would have destroyed the file allocation table and
which would mean there's no way to string the data blocks back together
again into files.

What happened was most probably my fault. I had given my regular user
id permission to directly access the /dev/hda* files by adding him in
the "disk" group and giving that group write permission. I did that to
enable VMware to run as a regular user and be able to access my native
Windows partition instead of having a second Windows installation
inside Linux. I should have done it more carefully using the suid bit
and giving access only to a special vmware user id, and then given him
access only to the paritition he needs.
 
M

microx

By the way, a few more notes:
If /usr/local is intact, thet would be further evidence.

You're right, it is intact. :)
My guess would be that you actually ran either into a rare bug in
LVM or that some other part of FC5 has a serious issue. I am inclined
to believe the later since filesystem or LVM bugs are very rare
today. Might be a good idea to go to a newer kernel as well.

Actually, I find FC5 and Fedora in general to be quite stable. This is
the first major problem I faced since I started using it (I've been
there since FC1 was released). I haven't lost confidence in LVM either,
it's just that it tends to complicate matters in a corruption
situation. I've moved my /home into a stand-alone primary partition
now, but I'm keeping /, /usr/local and swap in the LVM, since they
wouldn't really matter if I had to recover data, and it allows me to
resize them if one of them gets full.

Thanks again for anyone who offered advice.
 
A

Arno Wagner

Previously microx said:
Amazing how some people can come into a useful discussion, offer no
information whatsoever, and then call others a babblebot. :)

That is just Folkert. For some reason he is terminally envious
and thinks that diffamation will raise his reputation here and
decrease mine. Somewhat ineffective in comparison to trying to be
helpful and competent....
For the record, I managed to recover 100% of my data, just using fsck.

Very good. Congratulations!
By the way, thanks to ext3, I was much more comfortable with the
incident happening in Linux than last time when it was under Windows.
Had I been using FAT, accidentally overwriting the first few sectors of
the partition (which is also much more likely to happen with a berserk
program in Windows) would have destroyed the file allocation table and
which would mean there's no way to string the data blocks back together
again into files.

Indeed. That is why ext2/3 distributes the metadata all over the
partition and clusters the respective files around the directory
clusters. If you overwrite something you typically loose
metadata dna data for some files but nothing for the rest.
What happened was most probably my fault. I had given my regular user
id permission to directly access the /dev/hda* files by adding him in
the "disk" group and giving that group write permission.

Ooops. ''Interesting'' idea!
Would raise my adrenalin levels considerably... ;-)
I did that to
enable VMware to run as a regular user and be able to access my native
Windows partition instead of having a second Windows installation
inside Linux. I should have done it more carefully using the suid bit
and giving access only to a special vmware user id, and then given him
access only to the paritition he needs.

Hmm. No idea whether VMware is a risk here, but /dev/<some disk>
should definitiely be writable only for root.

Arno
 
A

Arno Wagner

Previously microx said:
By the way, a few more notes:
You're right, it is intact. :)
Actually, I find FC5 and Fedora in general to be quite stable. This is
the first major problem I faced since I started using it (I've been
there since FC1 was released). I haven't lost confidence in LVM either,
it's just that it tends to complicate matters in a corruption
situation. I've moved my /home into a stand-alone primary partition
now, but I'm keeping /, /usr/local and swap in the LVM, since they
wouldn't really matter if I had to recover data, and it allows me to
resize them if one of them gets full.

Well, seeing you other posting about the ''interesing'' permissions,
I tend to agree about stability. And /home is definitely the
most critical part. The rest can be recreated with minor effort.

Arno
 
F

Folkert Rienstra

microx said:
Amazing how some people can come into a useful discussion,

It's not a useful discussion obviously, when LVM is questioned -in your particular situation- by babblebot.
offer no information whatsoever,

The information was identifying babblebot to you.
and then call others a babblebot. :)
(not 'a' babblebot, just "babblebot", aka "babblehead" aka "babblemouth" aka "Arnie").

Anyone who questions the use of LVM in a particular situation when the use
of LVM is just ease of use in dealing with partitions is obviously a babblebot.

If there is a point to it then it would be of using partitions at all, not using LVM.
For the record, I managed to recover 100% of my data, just using fsck.

By the way, thanks to ext3, I was much more comfortable with the
incident happening in Linux than last time when it was under Windows.
Had I been using FAT, accidentally overwriting the first few sectors of
the partition
(which is also much more likely to happen with a berserk program in Windows)

No, it is not.
It is much more likely Windows itself doing that, not 'any' beserk program.
would have destroyed the file allocation table and which would mean
there's no way to string the data blocks back together again into files.

Only for non-contiguous files.
What happened was most probably my fault.

Only if you knew that Linux was unsafe.
I had given my regular user id permission to directly access the
/dev/hda* files by adding him in the "disk" group and giving that
group write permission. I did that to enable VMware to run as a
regular user and be able to access my native Windows partition
instead of having a second Windows installation inside Linux.

So Linux is obviously unsafe if that can be done but isn't really allowed.
I should have done it more carefully using the suid bit and giving access
only to a special vmware user id, and then given him access only to the
paritition he needs.

So what I hear is that Linux can be easily corrupted.
So much for "It is Linux and pretty corruption resistent"
 
M

microx

So Linux is obviously unsafe if that can be done but isn't really allowed.

Well, it is safe BECAUSE it is not allowed. :) I think it's a perfect
balance of flexibility and policy. The most powerful thing about Linux
and Unix in general is that anything is possible somehow, based on the
administrator's policy, but by default out of the box, you're given a
safe set-up, until you choose to change it for a specific purpose. This
is the main reason why viruses and spyware are almost non-existent on
Linux, and as far as I can tell it seems to be successful.

If the admin (like myself in this case) has chosen to allow a certain
user or group of programs to directly access the disk, then there is
always the risk of that program containing a bug or malicious code that
lets it muck up the disk. So the admin should exercise discretion which
permissions he gives to which users and/or programs. There's no way to
disallow it completely other than having an OS policy that says for
example, "no user-space program is allowed to directly access the disk,
only the kernel is allowed to do that", and then not even giving the
admin the ability to change that policy (close to what Windows does).
In that case, nifty software like gpart, data recovery stuff, etc. will
not be possible under that OS, since everything would have to go
through the kernel's system calls, which most probably doesn't cover
everything that can be done with a disk, so they would have to be done
by booting into a special mode of the OS or an external system.

The alternative as in DOS and earlier versions of Windows is to allow
any user-space program to do whatever the hell it likes with no
security policy mechanism, and we all know how safe that made us. "Not
allowing" is the security feature introduced in Unix, without
compromising flexibility, and to some extent in more recent versions of
Windows.

Care to suggest an alternative approach? :)
 
A

Arno Wagner

Well, it is safe BECAUSE it is not allowed. :) I think it's a perfect
balance of flexibility and policy. The most powerful thing about Linux
and Unix in general is that anything is possible somehow, based on the
administrator's policy, but by default out of the box, you're given a
safe set-up, until you choose to change it for a specific purpose. This
is the main reason why viruses and spyware are almost non-existent on
Linux, and as far as I can tell it seems to be successful.
If the admin (like myself in this case) has chosen to allow a certain
user or group of programs to directly access the disk, then there is
always the risk of that program containing a bug or malicious code that
lets it muck up the disk. So the admin should exercise discretion which
permissions he gives to which users and/or programs. There's no way to
disallow it completely other than having an OS policy that says for
example, "no user-space program is allowed to directly access the disk,
only the kernel is allowed to do that", and then not even giving the
admin the ability to change that policy (close to what Windows does).
In that case, nifty software like gpart, data recovery stuff, etc. will
not be possible under that OS, since everything would have to go
through the kernel's system calls, which most probably doesn't cover
everything that can be done with a disk, so they would have to be done
by booting into a special mode of the OS or an external system.
The alternative as in DOS and earlier versions of Windows is to allow
any user-space program to do whatever the hell it likes with no
security policy mechanism, and we all know how safe that made us. "Not
allowing" is the security feature introduced in Unix, without
compromising flexibility, and to some extent in more recent versions of
Windows.

Good summary. Unix gives you the power to decide but requires
you to know how to handle it. Sypical server OS approach. Windows
has the designers predict what you want to do and what not and
allows exactly that. Not sure what type of OS this is. Maybe
''kindergarden''?
Care to suggest an alternative approach? :)

I would like to hear whether there is even a single one that
is as good or better than the Unix one...

Arno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top