Faster through multiple file requests

Suggestions and feature requests.
Locked
Message
Author
PointerToVoid
Posts: 4
Joined: 02.01.2008, 01:46

Faster through multiple file requests

#1 Post by PointerToVoid » 02.01.2008, 02:16

Happy new year!

When processing many small files, for instance when copying directories or comparing directories, FC (like so many programs) gets less fast due to the hard disk's access time to each file.

However, modern disks (most Scsi, S-Ata 2, and some P-Ata like Hitachi) allow to optimize this access time by reorganizing the accesses out-of-order (called NCQ by S-Ata 2). Very efficient, provided that the program issues many file requests in a queue without waiting each one to be completed.

Could FC implement it ?

This needs to use ReadFile() with FILE_FLAG_OVERLAPPED. I believe it works on Win Nt4, 2k, Xp, 2k3 and their successors, but not on W95-98-98se-Me.

For instance in directory compare, FC would detect small files in the directories and issue read requests for up to 50 small files or up to 20MB from the left directory, then request corresponding files from right directory, and compare the files as they become available in Ram.

As each directory is often contiguous on the hard disk, reading many small files from it is very fast, especially if reordering is allowed, but hopping from one directory to the other is very slow.

File copying would benefit the same way and would be much faster than in Windows command.

I can email a Pdf that contains a short example.

Thanks!

PointerToVoid
Posts: 4
Joined: 02.01.2008, 01:46

#2 Post by PointerToVoid » 02.01.2008, 02:27

For "Small file" FC could use the same limit as NTFS does - might be 4kB, or 1 cluster, but I'm not sure.

Or a higher limit would also be meaningful, something like 1% of the available Ram for each file.

Processing separately the NTFS small files is an advantage because NTFS stores them on a separate location of the disk's volume, so in one single directory, small files are at one location and "big" ones at another.

I guess an optimum is to process all very small files (NTFS limit) first, then all files that are small compared to the available Ram space, and finally accessing individually all files that don't easily fit in one piece in the Ram.

PointerToVoid
Posts: 4
Joined: 02.01.2008, 01:46

#3 Post by PointerToVoid » 02.01.2008, 02:44

Just to quantify the available gains:

For file compare or copy, Ram and processors are fast, disks are the limit.

When hopping from one directory to the other, the average latency exceeds half a platter rotation, which is 4.17ms at 7200rpm.

But with NCQ allowed on contiguous directories, disks can be MUCH faster.

AttoDisk (available here http://members.home.nl/rvandesanden/ATT ... hmark.html) measures 50MB/s for 4kB files and 10MB/s for 0.5kB files when filling a queue with 10 read requests on my disk (7200rpm, 160GB per platter, NCQ equivalent on P-Ata)! This equals to a mean latency of 0.08ms and 0.05ms per file! Which is possible only because the disk reads the best placed file from the request queue.

Locked

Who is online

Users browsing this forum: No registered users and 38 guests