Copyright (C) 2000 Andrew Clausen
Permission is granted to copy, distribute and/or modify under the terms of the GNU Free Documentation License, Version 1.0 or any later version published by the Free Software Foundation; This licence can be found at http://www.gnu.org/copyleft/fdl.html
Partitioning is something most people (myself included!) find rather scary. Especially if your only backup is mirror.aarnet.edu.au. But it's something you pretty much have to do if you want to run other operating systems. There are alternatives, but you probably won't like them.
I'll be discussing two common reasons for partitioning: installing multiple operating systems, and drive imaging - essentially making copies of the same hard disk on many computers. And, of course, I'll be talking about how my program solves all of your Life Problems(TM). If there's time, I'll talk about its history, and how it works - to keep all the hackers here awake.
There are a few ways to copy floppy disks. You can do it like this:
# cp -r /mnt/floppy0 /mnt/floppy1
This works by reading each file from /mnt/floppy0 (the first floppy disk), and then writing a copy to /mnt/floppy1 (the second floppy disk), doing each file one at a time. But there were a few commands we missed out! We need to mount, and unmount the disks:
# mount /dev/fd0 /mnt/floppy0
# mount /dev/fd1 /mnt/floppy1
# cp -r /mnt/floppy0 /mnt/floppy1
# umount /mnt/floppy0
# umount /mnt/floppy1
The important thing is the kernel of the operating system handles reading, writing and creating the files.
Here's another way to copy a disk:
# cp /dev/fd0 /dev/fd1
This works by reading each block (a small region) from /dev/fd0, and writing a copy to /dev/fd1, doing each block one at a time.
So what difference does it make?
1. Copying each block, one at a time, can't work if the new disk is smaller than the old disk
Obvious. If you keep copying one at a time, you'll get up to block number 2000 (or whatever it is), and have no where to put it!
On the other hand, if you copy each file one at a time, it might work, if all of the files are small enough to fit on the new disk. This is another way of saying, copying one file at a time will work, provided all of the blocks used for storing files (as opposed to sitting vacant) can fit on the new disk.
The problem is, when you copying a disk one block at a time, you don't know which blocks you need to copy!
2. Copying each block, one at a time, doesn't require you to mount the file systems.
When you mount a disk, the operating system attempts to make some sense of the disk. It looks at things like: how big is it? how much free space does it have?
So, when you copy each file, one at a time, the operating system has to understand the disk. When you copy one block at a time, the operating system doesn't interpret the disk in any way.
This raises an important point: how does the operating system "understand" a disk? For example, how does it know which blocks are used for files, and which ones are unused? The answer is, it depends on the type of file system. Some types of file systems you might have heard of are: Ext2 (used by Linux), FAT (used by DOS/Windows 9x), NTFS (used by Windows NT/2000) and ISO-fs (used for cdroms). There are thousands more.
The thing is, the operating system needs to have support written for each type of file system. For example, DOS doesn't understand Ext2 file systems. On the other hand, Linux supports over 30 different types of file systems!
Anyway, each different type of file system has it's own methods for storing information about files, blocks, etc.
So far we've been talking about small floppy disks. However, file systems are also used on hard disks, to store not only your work, but your programs, and your operating system(s).
While it would be possible to have one file system take up the whole disk, this doesn't allow for having more than one operating system, because different operating systems must be stored on different types of file systems. Therefore, hard disks are split up into one or more partitions. A partition is like a disk inside a disk - where a single file system can be stored. The information about how big each partition is, and each partition's location on the disk is stored in the partition table (sometimes called the disk label). The partition table is stored on the first block of the disk.
When you install an operating system, it will often create one big partition for itself. Then, when you want to install another operating system, there is no room to make another partition to install it on.
One option would be to completely wipe your hard disk, and install the old operating system again, and then the new one. You could even back-up your work, so you get to keep that!
Another option is to "shrink" the big partition hogging the disk with programs like GNU Parted, Diskdrake, ext2resize or FIPS. This process requires that all files that are stored on blocks outside the area to be taken by the new partition be moved. Then the file system and needs to be modified, and then the partition table. The above programs do either some or all of the work for you. Here's a brief summary of their capabilities (only free software is considered here):
GNU Parted (http://www.gnu.org/software/parted/)
To start GNU Parted, you type
# parted /dev/hda
Obviously, replace /dev/hda with the hard disk you are using.
When it starts, it looks like this:
Typing help (or h for short) lists all of the commands you can use.
This displays the partitions. For example, my computer looks like this:
In the fourth column, we have the partition type. There are three types of partitions: primary, extended and logical. Primary partitions are normal partitions that contain file systems. In an ideal world, there would only be primary partitions. Unfortunately, there can only be up to four primary partitions. Which is why we need extended and logical partitions. An extended partition is a special type of primary partition that contains logical partitions instead of primary partitions. And logical partitions contain file systems. So if you run out of primary partitions, you have to use logical partitions.
The first column, with the heading "Minor" is the partition number. Since there are up to 4 primary partitions, Primary partitions are numbered 1-4, and logical partitions are numbered 5 onwards.
The second and third column give the start and end of the partitions. The unit is in cylinders (more on this later). The size of each cylinder is given above as 4032k (about 4 megabytes). On this disk, the first partition starts at cylinder 10. Cylinders 1-9 are unused. This is about 9×4=36 megabytes. This is a bit of a waste, and we'll be fixing this later on.
The second-last column is the type of file system. Pretty straight forward.
The last column is the flags. There are 2 flags: hidden - useful for hiding partitions from Windows - and boot. Boot is used by the DOS boot code to determine which partition to boot off. More on this later.
Guess what the resize command does? I used it to use some wasted space on my FAT partition, like this:
But I forgot to unmount it! Notice how you can choose to "ignore"? I strongly advise you not to ignore, because it can really screw things up if the partition is mounted. If you want to resize your root partition, you'll need to create a boot disk. I'm not going to cover this, but basically, you should compile Parted without readline (you can also download a special RPM already without readline), and stick it on a floppy disk. After you boot the boot disk, change disk to the Parted disk, and you should be able to run it.
This command creates a file system on a partition. This obliterates the file system on the partition (if there is one), and creates a new file system.
This creates a new partition, and creates a file system on it.
When you create, resize or copy FAT partitions, it will often ask you if you want to use FAT32. There are two flavours of FAT: FAT16 and FAT32. FAT32 is better than FAT16, but DOS, old versions of Windows 95, and OS/2 (I believe) can't understand it.
The main difference between FAT16 and FAT32 is cluster sizes. A cluster (Microsoft terminology) is a chunk of space on the disk, where part of a file can be stored. So the first bit a file might start at cluster number 25, and the second bit at cluster 39, and so on. All clusters on a FAT file system are the same size. So if the size of all clusters is, say, 16 kilobytes, then a file that is only 500 bytes long is still going to need an entire cluster, and a file 17 kilobytes long is going to need 2 entire clusters, even though most of the last cluster in the files is unused. This wasted space is called "slack space". The bigger the cluster size is, the more space is wasted in this way.
Anyway, with FAT16, you can only have about 65000 clusters on the file system. So if you've got a 4 gig partition, that means you'd need to have a cluster size of 4 gig divided by 65000, which is 64k. This is enormous! To give you an idea, Linux's ext2 file system doesn't even support a cluster size of more than 4k!
However, with FAT32, you can have lots of clusters (2 to the power of 28. Perhaps they should have called it FAT28). So you can use 4k clusters if you want. This is definitely doing.
This creates a new partition, without creating a file system. This is useful if you accidentally delete a partition. You can simply create a new partition with mkpart that starts and ends in the same place, and you should have the old file system in tact.
Note that if the partition used to be a primary partition before it was deleted, the new partition must also be primary. Likewise, if it was logical, it must remain logical.
As you can probably tell, you use rm (short for remove) to delete partitions, and check to check the partition, and the file system on it for errors.
If the world was an ideal place, that would be the end of the story. But there are a few more headaches.
When you turn your computer on, this is what happens:
Since the MBR and boot sector are only 512 bytes each, there isn't much room to have hard disk drivers. Nor is there much room for understanding the filesystem. This creates quite a few problems:
Anyway, this geometry system requires that each disk have a size, given in CHS. There is no (reliable) way for the operating system to find out this size, but the operating system, and the BIOS must agree on what it is, none the less. If you think this is crazy, then you're right! The fact that your computer manages to boot comes down to guess work and sheer luck! 9 times out of 10, using Parted blindly won't change this.
Linux guesses what the geometry is, and mostly gets it wrong. This doesn't matter that much, because Linux doesn't need to do anything with it. Parted uses what Linux guesses, but tries to make sure it gets it right. If knows the Linux geometry is wrong, it'll have a go at guessing it. It will probably get it right, or at least, come up with a geometry that works, which I guess is the same thing. You can check for yourself, by going into the BIOS setup program. If Linux does get it wrong, then you can tell it what the real geometry is, but I won't cover that here.
If it does get it wrong, there are these possibilities for failure: Linux won't boot, DOS/Windows won't boot, DOS/Windows will obliterate its own partition, or DOS/Windows won't be able to find it's own partition. You've been warned.
Disk imaging is a method for avoiding the horrible Windows install process. For example, if you want to install Windows and Office on 1000 machines, it'll probably take you about 5 times 1000 hours. Things aren't so bad with GNU/Linux, because there are programs like Red Hat's kickstart, which allow you to automate the install of other programs, or practically anything you need to do. Therefore, disk imaging is really only used for the Windows machines.
With disk imaging, you can burn a CD with a disk image of a partition containing Windows and Office, and copy the partition directly onto the hard disks of all the computers, by sticking in a boot disk and the CD, and letting it fly. But the partition on the Windows disk is probably going to be bigger, so the partition will also have to be resized. I've had several people say that they've managed to automate this process with Linux boot floppies and parted. It is possible to use the CD ROM only, by using the floppy as the boot image on the CD. Read the CD writing HOWTO for more information. There are a few weird things you have to do to get this whole thing to work (which will be fixed in the next stable series). Anyway, this is the general process:
# dd if=/dev/zero of=/root/cdimage/diskimage bs=$[1024 * 1024] count=640
# parted /root/cdimage/diskimage mklabel msdos mkpart primary fat 1 $[1024 * 1024]
# parted /dev/hda cp 1 /root/cdimage/diskimage 1
# dd if=/dev/hda of=/root/cdimage/mbr.image bs=446 count=1
localhost:~/parted-1.0.10# ./configure --disable-nls --without-readline; make
# mount /dev/cdrom /mnt/cdrom
# parted /dev/hda mklabel msdos mkparts primary fat 1 SOME-SIZE
# parted /mnt/cdrom/diskimage cp 1 /dev/hda 1
# dd if=/mnt/cdrom/mbr.image of=/dev/hda bs=446 count=1
Obviously, I can and will make this process a lot easier. We're considering making a mini-distribution to do this. I wouldn't have time to maintain such a thing - any volunteers?
Parted was largely written by myself and Lennert Buytenhek, who lives in Holland. Matt Wilson, from Red Hat also did a bit of the partition code. I wrote the FAT code, the linux-swap code, the library interface and the front-end. Lennert wrote the ext2 code.
I started writing the FAT resizer in late 1998, and got it working in early 1999. Lennert did his ext2 resizer around the same time. Then, we discovered each other's projects, and started working together around April 1999. The FAT code underwent major surgery during this time, so it wasn't until September that a working version of Parted was available. The first stable version, 1.0.0, was released in late December.
At the moment, Parted is about 17 000 lines of code. It is going into Debian Potato (thanks to Timshel, who may be here tonight), and a few other distro's you've never heard of. It'll probably end up in the next major release of Red Hat, and I have no idea about the others. The latest stable version is 1.0.10, and the latest development version is 1.1.2. It has been translated into six languages.
The development version can shrink the cluster size. We're also planning to support journaling, and other partition table types (like Mac and BSD) in the next stable version.
|Parted home page||www.gnu.org/software/parted|
|Parted mailing firstname.lastname@example.org
Subscribe by mailing email@example.com with subscribe in the subject
|Diskdrake home page||www.linux-mandrake.com/diskdrake|
|Ext2resize home page||www.leidenuniv.nl/~buytenh/ext2resize|
|FIPS home page||www.igd.fhg.de/~aschaefe/fips|
The FAT resizer had to satisfy these three requirements:
It is successful on the speed, successful for 99.9% of the time on the reliability (we'll be fixing this to 100% soon), and not TOO bad on the memory usage, although some people have had problems.
So what has to be done to resize a partition?
One really crappy way to do it would be:
First, this won't work if you want to move the start of the partition, because the metadata (such as the boot sector and the file allocation tables) at the start of the partition has to be moved. Second, it'll take about a week do a 100 Mb partition.
To speed things up, make use of the fact that seeking is slow, but once you've got the disk to the right place, a large read or write is relatively quick. So large, continuous reads are good. This means its good to read lots of neighbouring clusters in, in one big group. And it's good to update the FATs and the directory tree in one go.
So Parted searches for some neighbouring clusters that need to be moved and reads in the entire neghbourood around the required clusters. When it goes to write, it also writes out the clusters in one big go. But, it needs to be careful! Some clusters in the same neighbourhood may be in use. A large continuous write may overwrite them. So Parted reads in the neighbourhood first, so when it comes to write out again, it overwrites with the same data that was there originally.
There's a catch to updating the directory tree in one go at the end. Half-way through updating, the directory tree will be inconsistent. For example, each directory keeps track of it's position on the disk, with the . entry. And the parent directory also keeps track of it's subdirectories. It's impossible to update both places in one go. There are more examples of such inconsistencies. They must be avoided, since updating the directory tree could take up to a minute, in some cases, leaving a large window of time for data loss.
So while Parted is copying the clusters that are outside the partition, it also copies clusters that are part of the directory tree. When it updates the directory tree at the end, it updates the copy. Then, it "activates" the copy.
Parted doesn't do an atomic update of the File Allocation Tables, however. This isn't a major problem, because they lie on a continous region on the disk, so it takes (much) less than a second to write them out. But we plan to use journaling, so there is no chance of data loss.