When unsafe to assume HDD sector size is 512 bytes?

Discussion:

When unsafe to assume HDD sector size is 512 bytes?

(too old to reply)

James Harris

2020-03-29 10:01:46 UTC

Some queries on interacting with Advanced Format (4k) drives.

When will it become unsafe for an OS to assume that an HDD's sector size
is 512 bytes? Or has it become unsafe already???

AIUI 4k drives have two modes: 512e meaning emulation of 512-byte
sectors and 4kn meaning 4k native.

I would hope that older commands such as ATA Read Sectors (0x20 or 0x21)
will always transfer data in 512-byte units, and there would be separate
commands to read and write larger sectors.

Or, failing that, that all drives would emulate 512-byte sectors by
default and they would have to be switched to 4kn mode if an OS was
ready for it.

But is that true? I fear it's not as it seems some drives are
specifically sold as 512e and some as 4kn.

Anyone know?

--
James Harris

JJ

2020-03-29 11:36:43 UTC

Post by James Harris
Some queries on interacting with Advanced Format (4k) drives.
When will it become unsafe for an OS to assume that an HDD's sector size
is 512 bytes? Or has it become unsafe already???
AIUI 4k drives have two modes: 512e meaning emulation of 512-byte
sectors and 4kn meaning 4k native.
I would hope that older commands such as ATA Read Sectors (0x20 or 0x21)
will always transfer data in 512-byte units, and there would be separate
commands to read and write larger sectors.
Or, failing that, that all drives would emulate 512-byte sectors by
default and they would have to be switched to 4kn mode if an OS was
ready for it.
But is that true? I fear it's not as it seems some drives are
specifically sold as 512e and some as 4kn.
Anyone know?

You'll just have to check the HDD and computer motherboard specification as
well as your OS documentation regarding 4Kn.

Here's a snippet from Wikipedia about 4Kn support by some OSes.

https://en.wikipedia.org/wiki/Advanced_Format#Overview

[quote]
For example, Windows Vista, Windows 7, Windows Server 2008, and Windows
Server 2008 R2 (with certain hotfixes installed) support 512e format drives
(but not 4Kn),[12] as do contemporary versions of FreeBSD[13][14][15] and
Linux.[16][17]

Mac OS X Tiger and onwards can use Advanced Format drives[18] and OS X
Mountain Lion 10.8.2 additionally supports encrypting those. Windows 8 and
Windows Server 2012 also support 4Kn Advanced Format.[12] Oracle Solaris 10
and 11 support 4Kn and 512e hard disk drives for non-root ZFS file systems,
while version 11.1 provides installation and boot support for 512e
devices.[19]
[/quote]

James Harris

2020-03-29 18:49:55 UTC

Post by James Harris
Some queries on interacting with Advanced Format (4k) drives.
When will it become unsafe for an OS to assume that an HDD's sector size
is 512 bytes? Or has it become unsafe already???
AIUI 4k drives have two modes: 512e meaning emulation of 512-byte
sectors and 4kn meaning 4k native.
I would hope that older commands such as ATA Read Sectors (0x20 or 0x21)
will always transfer data in 512-byte units, and there would be separate
commands to read and write larger sectors.
Or, failing that, that all drives would emulate 512-byte sectors by
default and they would have to be switched to 4kn mode if an OS was
ready for it.
But is that true? I fear it's not as it seems some drives are
specifically sold as 512e and some as 4kn.
Anyone know?

You'll just have to check the HDD and computer motherboard specification as
well as your OS documentation regarding 4Kn.
Here's a snippet from Wikipedia about 4Kn support by some OSes.
https://en.wikipedia.org/wiki/Advanced_Format#Overview
[quote]
For example, Windows Vista, Windows 7, Windows Server 2008, and Windows
Server 2008 R2 (with certain hotfixes installed) support 512e format drives
(but not 4Kn),[12] as do contemporary versions of FreeBSD[13][14][15] and
Linux.[16][17]
Mac OS X Tiger and onwards can use Advanced Format drives[18] and OS X
Mountain Lion 10.8.2 additionally supports encrypting those. Windows 8 and
Windows Server 2012 also support 4Kn Advanced Format.[12] Oracle Solaris 10
and 11 support 4Kn and 512e hard disk drives for non-root ZFS file systems,
while version 11.1 provides installation and boot support for 512e
devices.[19]
[/quote]

That's informative but the query was really about reading 4kn drives in
our own OSes.

--
James Harris

James Harris

2020-04-02 16:50:09 UTC

Post by James Harris
Some queries on interacting with Advanced Format (4k) drives.
When will it become unsafe for an OS to assume that an HDD's sector size
is 512 bytes? Or has it become unsafe already???

Unless someone can say otherwise it may already be unsafe.

That's because from the limited info I've found 4kn (4k native) drives
exist and do not provide a 512 emulation mode. If such were operated by
an OS which assumed 512 bytes per sector the OS's reads could fail and
its writes could corrupt the user's data.

I haven't yet found any good info on reading 4kn drives but it appears
that one should use the normal ATA commands 0x20 and 0x21 and they will
transfer in units of 4k blocks.

Fortunately, with at least ATA 8 (December 2006) entries were placed in
the Identify block to state the sector size etc.

Word 106: Logical sectors per physical sector
Word 117-118: Words per logical sector
Word 209: Alignment of logical blocks in a physical block

--
James Harris

James Harris

2020-04-02 16:59:28 UTC

Post by James Harris
Some queries on interacting with Advanced Format (4k) drives.
When will it become unsafe for an OS to assume that an HDD's sector
size is 512 bytes? Or has it become unsafe already???

Unless someone can say otherwise it may already be unsafe.
That's because from the limited info I've found 4kn (4k native) drives
exist and do not provide a 512 emulation mode. If such were operated by
an OS which assumed 512 bytes per sector the OS's reads could fail and
its writes could corrupt the user's data.
I haven't yet found any good info on reading 4kn drives but it appears
that one should use the normal ATA commands 0x20 and 0x21 and they will
transfer in units of 4k blocks.
Fortunately, with at least ATA 8 (December 2006) entries were placed in
the Identify block to state the sector size etc.
Word 106: Logical sectors per physical sector
Word 117-118: Words per logical sector
Word 209: Alignment of logical blocks in a physical block

Just to add to that there's a note on the history at

https://en.wikipedia.org/wiki/Advanced_Format#History

though the current links do not appear to verify its claims of there
being some 1K-sector drives.

--
James Harris

Scott Lurndal

2020-04-02 18:05:39 UTC

Post by James Harris

Post by James Harris
Some queries on interacting with Advanced Format (4k) drives.
When will it become unsafe for an OS to assume that an HDD's sector
size is 512 bytes? Or has it become unsafe already???

Unless someone can sayÂ otherwise it may already be unsafe.
That's because from the limited info I've found 4kn (4k native) drives
exist and do not provide a 512 emulation mode. If such were operated by
an OS which assumed 512 bytes per sector the OS's reads could fail and
its writes could corrupt the user's data.
I haven't yet found any good info on reading 4kn drives but it appears
that one should use the normal ATA commands 0x20 and 0x21 and they will
transfer in units of 4k blocks.
Fortunately, with at least ATA 8 (December 2006) entries were placed in
the Identify block to state the sector size etc.
Word 106: Logical sectors per physical sector
Word 117-118: Words per logical sector
Word 209: Alignment of logical blocks in a physical block

Just to add to that there's a note on the history at
https://en.wikipedia.org/wiki/Advanced_Format#History
though the current links do not appear to verify its claims of there
being some 1K-sector drives.

I've seen drives (SCSI/SAS and SATA) with 100-byte sectors, 180-byte
sectors, 512-byte sectors, 520-byte sectors, 528-byte sectors,
and 4096 byte sectors. All of them have a discovery (IDENTIFY) mode to
determine the sector size (and all SCSI and some SATA have mode pages that allow
the sector size to be changed).

wolfgang kern

2020-04-03 08:56:25 UTC

Post by James Harris
Some queries on interacting with Advanced Format (4k) drives.
When will it become unsafe for an OS to assume that an HDD's sector
size is 512 bytes? Or has it become unsafe already???

Unless someone can say otherwise it may already be unsafe.

I have three 3TB SATA HDs in my workstation, all with 4096 sectors.
But my OS still use native port access with 512 byte in LBA48 mode.
this 4096 size seem to mean just major alignment an not access size.

That's because from the limited info I've found 4kn (4k native) drives
exist and do not provide a 512 emulation mode. If such were operated by
an OS which assumed 512 bytes per sector the OS's reads could fail and
its writes could corrupt the user's data.

Haven't encountered a problem here so far.

I haven't yet found any good info on reading 4kn drives but it appears
that one should use the normal ATA commands 0x20 and 0x21 and they will
transfer in units of 4k blocks.

I cannot confirm this. I use 0x24 0x34 commands for LBA48 access.

Fortunately, with at least ATA 8 (December 2006) entries were placed in
the Identify block to state the sector size etc.
Word 106: Logical sectors per physical sector
Word 117-118: Words per logical sector
Word 209: Alignment of logical blocks in a physical block

yeah, a bit confusing but 512 byte size seem to have survived.
__
wolfgang

James Harris

2020-04-03 17:19:29 UTC

Post by wolfgang kern

Post by James Harris
Some queries on interacting with Advanced Format (4k) drives.
When will it become unsafe for an OS to assume that an HDD's sector
size is 512 bytes? Or has it become unsafe already???

Unless someone can say otherwise it may already be unsafe.

I have three 3TB SATA HDs in my workstation, all with 4096 sectors.
But my OS still use native port access with 512 byte in LBA48 mode.
this 4096 size seem to mean just major alignment an not access size.

That sounds like 512e

https://en.wikipedia.org/wiki/Advanced_Format#512e

--
James Harris

Rod Pemberton

2020-08-01 23:08:07 UTC

On Thu, 2 Apr 2020 17:50:09 +0100

Post by James Harris
That's because from the limited info I've found 4kn (4k native)
drives exist and do not provide a 512 emulation mode. If such were
operated by an OS which assumed 512 bytes per sector the OS's reads
could fail and its writes could corrupt the user's data.

I don't follow.

How do you come to this conclusion? i.e., failed reads, corrupted writes

I.e., I would /assume/ that the 512 bytes were read or written to the
start of the 4096 bytes, and the remaining 3584 bytes would be
inaccessible to the 512 byte capable host.

In other words, inaccessibility of data or the appearance of data
corruption would only be likely if the drive was written to in a
4K-only capable machine, and then read from a 512 byte capable machine.

Obviously, I said /assume/ as I'm not sure how this works. The drives
may translate 512 byte calls into 4K by reading 8 512 byte sectors
into a buffer, write 512 bytes into the correct location in the buffer,
and then write out 4K to the disk.

Rod Pemberton

--
"You only need 5.13 bits to represent a 'unique' snowflake."

James Harris

2020-08-03 12:56:07 UTC

Post by Rod Pemberton
On Thu, 2 Apr 2020 17:50:09 +0100

Post by James Harris
That's because from the limited info I've found 4kn (4k native)
drives exist and do not provide a 512 emulation mode. If such were
operated by an OS which assumed 512 bytes per sector the OS's reads
could fail and its writes could corrupt the user's data.

I don't follow.
How do you come to this conclusion? i.e., failed reads, corrupted writes

I was thinking that with a switch to 4k block sizes the IDE/ATA commands
would work on chunks of 4k while the OS still assumed chunks of 512 bytes.

As an example of a read failure the OS might ask for LBA 1000 wanting
the 512 bytes beginning at 512,000. It would, instead, get the 4096
bytes from 4,096,000.

A corrupting write could be such as the drive overwriting 4096 bytes of
the disk when the driver only intended to overwrite 512 bytes. And it
would write in a different place from that intended.

Post by Rod Pemberton
I.e., I would /assume/ that the 512 bytes were read or written to the
start of the 4096 bytes, and the remaining 3584 bytes would be
inaccessible to the 512 byte capable host.
In other words, inaccessibility of data or the appearance of data
corruption would only be likely if the drive was written to in a
4K-only capable machine, and then read from a 512 byte capable machine.
Obviously, I said /assume/ as I'm not sure how this works. The drives
may translate 512 byte calls into 4K by reading 8 512 byte sectors
into a buffer, write 512 bytes into the correct location in the buffer,
and then write out 4K to the disk.

A drive operating in 512e mode would behave as though its sector size
was 512 bytes. A drive operating in 4kn mode would behave as though its
sector size was 4,096 bytes.

The ATA controller's read and write commands assume a certain sector
size. I don't think there's any IO command which takes the sector size
as a parameter.

--
James Harris

Benjamin David Lunt

2020-04-23 03:38:17 UTC

But is that true? I fear it's not as it seems some drives are specifically
sold as 512e and some as 4kn.
Anyone know?

Hi guys,

It has been a long time since I actually posted here. I stop by
every once in a while to read, but never really took the time to
post. Sorry that this thread is nearly 30 days old.

I was reading the "Absent Friends" thread a bit too. To get to
Usenet, I actually have to move to an older machine so that I can
still use "Outlook Express" to access my Usenet stuff. This is
probably one of the reasons I haven't been here in a while. The
other is like Rod was saying, I have other things in my life that
have been more important. One very important aspect is I have a
brand new grand-daughter. Ya, Rick, I am nearly 50 myself :-)

Anyway, I thought I would just mention a technique I do to make
sure the drive is a 512-byte sector drive.

I read a sector, assuming it is 512 bytes. Now, knowing that the
controller should fire an interrupt *and* the DRQ bit should now be
clear, I can safely assume a 512-byte sector.

However, if no interrupt is fired *and* the DRQ bit is still set
after the 256th word is read, I try to read another 512 bytes
(256 words).

I then do the same test. This works for any sized sector as long
as it is a multiple of 256 words. However, with a simple change
I can actually move that down to a two-word boundary, checking for
any size sector as long as it is a multiple of two bytes.

words_per_sector = 0;
send_read_command();
do {
read_256_words();
words_per_sector += 256;
} while ((interrupt == 0) && (DRQ == 1));

"words_per_sector" now contains the size of the sector (in words).

I do this for CDROM's as well, though a packeted command will
return the size for us in the HIGH and MID registers, though
you have to complete the read command or do a reset after reading
those two registers.

This technique has worked on about 95% of the hardware I have tested
it on, and I have not yet run into a 4096-byte only drive with it.

Fortunately, with at least ATA 8 (December 2006) entries were placed in
the Identify block to state the sector size etc.

My notes state that ATAPI 7 started the logical sector addition,
allowing multiple logical sectors per physical sector, as well
as words 106 and 117-118. However, I haven't check this in years.

Anyway, just thought I would let you know what I do in this situation.

Back to "Absent Friends", I frequent the forum.osdev.org forum and
see Alex there quite a bit. Do any of the rest of you visit that
forum?

Good to "see" you guys again,

Ben
- http://www.fysnet.net/osdesign_book_series.htm

James Harris

2020-04-23 13:16:09 UTC

Post by Benjamin David Lunt

But is that true? I fear it's not as it seems some drives are specifically
sold as 512e and some as 4kn.
Anyone know?

Hi guys,

...

Hi Ben, good to hear from you again and congratulations on the birth of
your granddaughter!

Post by Benjamin David Lunt
It has been a long time since I actually posted here. I stop by
every once in a while to read, but never really took the time to
post. Sorry that this thread is nearly 30 days old.
I was reading the "Absent Friends" thread a bit too. To get to
Usenet, I actually have to move to an older machine so that I can
still use "Outlook Express" to access my Usenet stuff.

In the interests of it becoming easier for you to post here more often
... [:-)] while I don't know whether it will run on your OS I switched
from OE to Thunderbird on Windows and then, later, to Thunderbird on
Ubuntu. I miss some of the OE stuff but Thunderbird is pretty good.

...

Post by Benjamin David Lunt
This technique has worked on about 95% of the hardware I have tested
it on, and I have not yet run into a 4096-byte only drive with it.

You know the obvious question that throws up: Why did it not work on the
other 5%? It looked to me as though your method should have provided
100% coverage.

An option I thought of was to read the same sector to two buffers. If we
had pre-filled each buffer with different values then it would be easy
to work out how many bytes were overwritten. Simply count the number of
bytes which matched (and for extra checking ensure that those which
followed had not been altered).

That should resolve it to byte level and would work with both PIO and
DMA although the latter would require either the pre-allocation of
buffers which were 'big enough' or page protection to guard against
unbounded writes.

...

Post by Benjamin David Lunt
Back to "Absent Friends", I frequent the forum.osdev.org forum and
see Alex there quite a bit. Do any of the rest of you visit that
forum?

I have tried osdev for discussions but I didn't like it much. IIRC it
was harder to follow threads and subthreads. Can't remember specifically
why but I didn't find it anything like as usable as the hierarchical
nature of Usenet.

That said, I should probably try osdev again. It seemed to be well
populated.

--
James Harris

Benjamin David Lunt

2020-04-24 01:23:16 UTC

Post by James Harris
Hi Ben, good to hear from you again and congratulations on the birth of
your granddaughter!

Thank you.

Post by James Harris
In the interests of it becoming easier for you to post here more often ...
[:-)] while I don't know whether it will run on your OS I switched from OE
to Thunderbird on Windows and then, later, to Thunderbird on Ubuntu. I
miss some of the OE stuff but Thunderbird is pretty good.

I will have a look. Thanks.

Post by James Harris

Post by Benjamin David Lunt
This technique has worked on about 95% of the hardware I have tested
it on, and I have not yet run into a 4096-byte only drive with it.

You know the obvious question that throws up: Why did it not work on the
other 5%? It looked to me as though your method should have provided 100%
coverage.

I don't remember why. I don't remember if it was the technique that failed
or if it was reading from some drives with the commands I was using that
failed. However, every once in a while, this function would return zero
for the sector size. It has been so long since I worked on that part
that I have totally forgotten why. I just remember it failing once in
a while. Hence the 95% statement, instead of stating 100%.

Post by James Harris
An option I thought of was to read the same sector to two buffers. If we
had pre-filled each buffer with different values then it would be easy to
work out how many bytes were overwritten. Simply count the number of bytes
which matched (and for extra checking ensure that those which followed had
not been altered).
That should resolve it to byte level and would work with both PIO and DMA
although the latter would require either the pre-allocation of buffers
which were 'big enough' or page protection to guard against unbounded
writes.

I agree here. Though with PIO, you as the programmer, can tell exactly
how many words you read since you use an explicet inpw instruction on
each read, while checking for the DRQ bit. As you stated, this should
work 100% of the time.

With the DMA method, yes, you would have to make sure the buffer was
'big enough'.

The only coveat I would see with the DMA method is if you preset the
buffer with a value and happened to read the exact same value, how
would you know where the read stopped? However, this would be very
minimal I think.

Post by James Harris

Post by Benjamin David Lunt
Back to "Absent Friends", I frequent the forum.osdev.org forum and
see Alex there quite a bit. Do any of the rest of you visit that
forum?

I have tried osdev for discussions but I didn't like it much. IIRC it was
harder to follow threads and subthreads. Can't remember specifically why
but I didn't find it anything like as usable as the hierarchical nature of
Usenet.
That said, I should probably try osdev again. It seemed to be well
populated.

It use to have quite a bit of "off topic" and somewhat harassment of
members as this group does/did. However, the moderators were a bit
laxed at the time. They have been a lot more strict now and it seems
to have helped. Unfortunately, we lost a good member in the process.
In my opinion, he was a wealth of information to the group, but
pressured by a few members, his membership was removed.
Very unfortunate.

Thanks again,
Ben

T. Ment

2020-04-24 02:38:01 UTC

Post by Benjamin David Lunt
Unfortunately, we lost a good member in the process.
In my opinion, he was a wealth of information to the group,
but pressured by a few members, his membership was removed.
Very unfortunate.

As web forums prove, computer geeks are insecure whiny cry babies. What
a bunch of losers.

James Harris

2020-04-24 10:19:41 UTC

...

Post by Benjamin David Lunt

Post by James Harris

Post by Benjamin David Lunt
This technique has worked on about 95% of the hardware I have tested
it on, and I have not yet run into a 4096-byte only drive with it.

You know the obvious question that throws up: Why did it not work on the
other 5%? It looked to me as though your method should have provided 100%
coverage.

I don't remember why. I don't remember if it was the technique that failed
or if it was reading from some drives with the commands I was using that
failed. However, every once in a while, this function would return zero
for the sector size. It has been so long since I worked on that part
that I have totally forgotten why. I just remember it failing once in
a while. Hence the 95% statement, instead of stating 100%.

I've acquired a number of test machines over the years, usually chosen
because although they are 'IBM compatible' they were unusual in some
way, and I have found that code of mine which works on most of them runs
into a snag or two on the others. That's why I bought them. I wanted to
have to get my code working with a variety of hardware.

Actually, having said that, I should correct it and say that I suspect
that most of the faults are not really hardware but them trapping to SMM
and the SMM code not doing everything it's supposed to - such as setting
status bits.

Post by Benjamin David Lunt

Post by James Harris
An option I thought of was to read the same sector to two buffers. If we
had pre-filled each buffer with different values then it would be easy to
work out how many bytes were overwritten. Simply count the number of bytes
which matched (and for extra checking ensure that those which followed had
not been altered).
That should resolve it to byte level and would work with both PIO and DMA
although the latter would require either the pre-allocation of buffers
which were 'big enough' or page protection to guard against unbounded
writes.

I agree here. Though with PIO, you as the programmer, can tell exactly
how many words you read since you use an explicet inpw instruction on
each read, while checking for the DRQ bit. As you stated, this should
work 100% of the time.

I have at least one machine which fails to set a status bit when it
should. I suspect SMM in that case. Not sure if it could apply to the
ATA interface. Perhaps not. But I have another machine where slowness to
effect an update is a potential cause of problems. It's a Toshiba Tecra
which you may remember gave Linux problems over the A20 gate. I found
that is just a bit slow to enable the A20 gate. From memory I think it
takes only about 68uS to enable the gate so it's not long in real terms
but an OS must wait for the change to take effect before starting to use
memory above 1M.

It's partly because of seeing a machine take a while to effect a setting
that I had reservations about checking the DRQ bit. I just wasn't sure
how long it would take DRQ to change on the slowest machine.

Post by Benjamin David Lunt
With the DMA method, yes, you would have to make sure the buffer was
'big enough'.
The only coveat I would see with the DMA method is if you preset the
buffer with a value and happened to read the exact same value, how
would you know where the read stopped? However, this would be very
minimal I think.

That's why I was thinking there would be /two/ buffers pre-filled with
different contents. To make a manageable example say the blocks on the
disk are just 4 bytes (not 4k) long and we set up two buffers, each 8
bytes long. We could pre-fill the buffers with (in hex)

00 00 00 00 00 00 00 00
FF FF FF FF FF FF FF FF

Then say the block we are going to read for this test contains (also in
hex) 22 22 22 22. After reading the block in to both buffers they would
hold

22 22 22 22 00 00 00 00
22 22 22 22 FF FF FF FF

A count of initial matches between the two buffers would yield 4, the
size of the block.

Even if some of the data read from the disk were to match what was in
one of the buffers the technique should still work. Say the block read
from disk contained 00 FF 00 FF so that its last byte matches what we
pre-loaded into the second buffer. Then after the reads the two buffers
would hold

00 FF 00 FF 00 00 00 00
00 FF 00 FF FF FF FF FF

Even though the block read from the disk ended with the same value as
was already in the second buffer (i.e. FF) a count of the number of
matched bytes would still yield a block size of 4.

And the rest of the buffers could be compare with their initial contents
as a sanity check.

At least, that's the idea. :-)

--
James Harris

T. Ment

2020-04-24 16:14:35 UTC

Post by James Harris
I've acquired a number of test machines over the years, usually chosen
because although they are 'IBM compatible' they were unusual in some
way

"IBM compatible" is what I said to you. Maybe you're sneaking over to
Google groups. Silly to KF someone and break your own rules.

Some people like to announce shunning and expect others to follow them.
That may work on a web forum with a moderated safe space, but you losers
can't ban anyone from Usenet.

Whoever wants to KF me, do it. Individual choice is the way to freedom.
Not power tripping moderators who think they know best.

Benjamin David Lunt

2020-04-25 04:07:03 UTC

Post by James Harris
I've acquired a number of test machines over the years, usually chosen
because although they are 'IBM compatible' they were unusual in some way,
and I have found that code of mine which works on most of them runs into a
snag or two on the others. That's why I bought them. I wanted to have to
get my code working with a variety of hardware.

I have done the same. I can count eight just sitting here at my desk.
More in storage. :-)

Post by James Harris
It's partly because of seeing a machine take a while to effect a setting
that I had reservations about checking the DRQ bit. I just wasn't sure how
long it would take DRQ to change on the slowest machine.

Indeed, but you would only do this once per drive, so the time would
be very insignificant.

Post by James Harris
That's why I was thinking there would be /two/ buffers pre-filled with
different contents. To make a manageable example say the blocks on the
disk are just 4 bytes (not 4k) long and we set up two buffers, each 8
bytes long. We could pre-fill the buffers with (in hex)
00 00 00 00 00 00 00 00
FF FF FF FF FF FF FF FF
Then say the block we are going to read for this test contains (also in
hex) 22 22 22 22. After reading the block in to both buffers they would
hold
22 22 22 22 00 00 00 00
22 22 22 22 FF FF FF FF
A count of initial matches between the two buffers would yield 4, the size
of the block.
Even if some of the data read from the disk were to match what was in one
of the buffers the technique should still work. Say the block read from
disk contained 00 FF 00 FF so that its last byte matches what we
pre-loaded into the second buffer. Then after the reads the two buffers
would hold
00 FF 00 FF 00 00 00 00
00 FF 00 FF FF FF FF FF
Even though the block read from the disk ended with the same value as was
already in the second buffer (i.e. FF) a count of the number of matched
bytes would still yield a block size of 4.
And the rest of the buffers could be compare with their initial contents
as a sanity check.
At least, that's the idea. :-)

This would work, yes, as long as you read the same sector to both
buffers. It's a good idea and should work.

Ben

James Harris

2020-04-25 09:16:14 UTC

...

Post by Benjamin David Lunt

Post by James Harris
It's partly because of seeing a machine take a while to effect a setting
that I had reservations about checking the DRQ bit. I just wasn't sure how
long it would take DRQ to change on the slowest machine.

Indeed, but you would only do this once per drive, so the time would
be very insignificant.

It wasn't the overall time that concerned me but, AIUI, your method
includes checking DRQ after the 256th word of each sector. I wondered if
some hardware could be a little slow to update DRQ and therefore that
such a check could get the wrong info - presumably that DRQ would still
be asserted because the ATA interface hadn't deasserted it in time.

The specifications may be that DRQ should be deasserted instantly but
I'm always wary of how 'instantly' that is! And there's always the
danger of wildly non-compliant hardware.

For example, I remember reading of one drive which under certain
conditions (something to do with how DMA was set up, possibly) would
even corrupt data it was writing to disk. I posted about it here but
cannot find the post just now. I guess that OSes like Linux and Windows
will include code to deal with rogue disk hardware just as they do with
CPU errata, because such disks were used in the wild.

Bottom line: I am simply wary of some hardware responding wrongly.

--
James Harris

James Harris

2020-08-01 08:21:33 UTC

On 29/03/2020 11:01, James Harris wrote:

...

Post by James Harris
But is that true? I fear it's not as it seems some drives are
specifically sold as 512e and some as 4kn.

A bit of additional info on this. Some Seagate drives can be switched
between 512e and 4kn with what Seagate calls Fast Format.

https://www.seagate.com/files/www-content/product-content/enterprise-performance-savvio-fam/enterprise-performance-15k-hdd/_cross-product/_shared/doc/seagate-fast-format-white-paper-04tp699-1-1701us.pdf

--
James Harris

James Harris

2021-01-16 08:50:52 UTC

Post by James Harris
...

Post by James Harris
But is that true? I fear it's not as it seems some drives are
specifically sold as 512e and some as 4kn.

A bit of additional info on this. Some Seagate drives can be switched
between 512e and 4kn with what Seagate calls Fast Format.
https://www.seagate.com/files/www-content/product-content/enterprise-performance-savvio-fam/enterprise-performance-15k-hdd/_cross-product/_shared/doc/seagate-fast-format-white-paper-04tp699-1-1701us.pdf

Going back to this topic of 512 vs 4k drives I've just come across a
curious anomaly that you might find of interest as it relates to OS
performance. fdisk says a certain drive's logical and physical
blocksizes are both 512 bytes whereas the kernel apparently thinks the
physical blocksize is 4k.

I found that writing 512-byte blocks to the drive (under Linix, with dd)
was very slow. A complete overwrite took four hours. By contrast,
writing it in 4k blocks took an hour and a half.

The reason, it seems from iostat, is that 4k blocks simply get written
(i.e. there's no read activity) whereas 512-byte blocks require as much
reading as writing, evidently if a program writes a 512-byte block to
the drive the kernel will (if necessary) read a bigger block before
modifying it and writing it back.

I've no idea why but I think it must be the kernel which is making this
(bad) decision.

For the record, here's what fdisk shows:

$ sudo fdisk /dev/sdh -l
Disk /dev/sdh: 58.2 GiB, 62461575168 bytes, 121995264 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

--
James Harris

wolfgang kern

2021-01-16 10:03:28 UTC

Post by James Harris

Post by James Harris

Post by James Harris
But is that true? I fear it's not as it seems some drives are
specifically sold as 512e and some as 4kn.

A bit of additional info on this. Some Seagate drives can be switched
between 512e and 4kn with what Seagate calls Fast Format.
https://www.seagate.com/files/www-content/product-content/enterprise-performance-savvio-fam/enterprise-performance-15k-hdd/_cross-product/_shared/doc/seagate-fast-format-white-paper-04tp699-1-1701us.pdf

Going back to this topic of 512 vs 4k drives I've just come across a
curious anomaly that you might find of interest as it relates to OS
performance. fdisk says a certain drive's logical and physical
blocksizes are both 512 bytes whereas the kernel apparently thinks the
physical blocksize is 4k.
I found that writing 512-byte blocks to the drive (under Linix, with dd)
was very slow. A complete overwrite took four hours. By contrast,
writing it in 4k blocks took an hour and a half.
The reason, it seems from iostat, is that 4k blocks simply get written
(i.e. there's no read activity) whereas 512-byte blocks require as much
reading as writing, evidently if a program writes a 512-byte block to
the drive the kernel will (if necessary) read a bigger block before
modifying it and writing it back.
I've no idea why but I think it must be the kernel which is making this
(bad) decision.
$ sudo fdisk /dev/sdh -l
Disk /dev/sdh: 58.2 GiB, 62461575168 bytes, 121995264 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

I wont trust loonix core nor sudo fdisk :)
my direct hardware way of testing (LBA48 PIO mode) show that 4K drives
are only a bit faster if accessed at 4K bounds. I cannot confirm that
512 byte block write needs read-modfy-write. It's just a bit slower.
But I haven't tested stream-mode.
__
wolfgang

James Harris

2021-01-16 13:24:40 UTC

...

Post by wolfgang kern

Post by James Harris
$ sudo fdisk /dev/sdh -l
Disk /dev/sdh: 58.2 GiB, 62461575168 bytes, 121995264 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

I wont trust loonix core nor sudo fdisk :)

Nor me. Take them as indicative, not definitive. :-)

Post by wolfgang kern
my direct hardware way of testing (LBA48 PIO mode) show that 4K drives
are only a bit faster if accessed at 4K bounds.

If you are issuing PIO commands then aren't you working with the drives'
/logical/ block sizes - probably 0.5k?

Post by wolfgang kern
I cannot confirm that
512 byte block write needs read-modfy-write. It's just a bit slower.

AIUI, any such read-modify-write would be handled within the drive
itself. You send it a 0.5k block; if the physical 4k block it's part of
is not in cache then the 4k block is read in to the drive's cache; the
cached 4k block is modified and marked as dirty; at some point the dirty
block is written back.

I think the problem I mentioned in the prior post was due to /Linux/
carrying out r-m-w - for no good reason I can think of.

Post by wolfgang kern
But I haven't tested stream-mode.

What do you mean by 'stream mode'?

--
James Harris

wolfgang kern

2021-01-17 07:31:41 UTC

Post by James Harris

Post by wolfgang kern
I wont trust loonix core nor sudo fdisk :)

Nor me. Take them as indicative, not definitive. :-)

:)

Post by James Harris

Post by wolfgang kern
my direct hardware way of testing (LBA48 PIO mode) show that 4K drives
are only a bit faster if accessed at 4K bounds.

If you are issuing PIO commands then aren't you working with the drives'
/logical/ block sizes - probably 0.5k?

It take figures reported by IDENTIFY DRV, aka hardware, not guessing :)

Post by James Harris

Post by wolfgang kern
I cannot confirm that 512 byte block write needs read-modfy-write.
It's just a bit slower.

AIUI, any such read-modify-write would be handled within the drive
itself. You send it a 0.5k block; if the physical 4k block it's part of
is not in cache then the 4k block is read in to the drive's cache; the
cached 4k block is modified and marked as dirty; at some point the dirty
block is written back.

this would mean a more drastic speed loss than I found on my HDs.
misaligned access may*(if not cached) see a penalty of one revolution.

Post by James Harris
I think the problem I mentioned in the prior post was due to /Linux/
carrying out r-m-w - for no good reason I can think of.

Post by wolfgang kern
But I haven't tested stream-mode.

What do you mean by 'stream mode'?

Fast consecutives UDMA read into high speed channels like modern sound
and graphic seem to have (bypassing RAM).
__
wolfgang

Rod Pemberton

2021-01-18 06:22:16 UTC

On Sat, 16 Jan 2021 08:50:52 +0000

Post by James Harris

Post by James Harris

Post by James Harris
But is that true? I fear it's not as it seems some drives are
specifically sold as 512e and some as 4kn.

A bit of additional info on this. Some Seagate drives can be
switched between 512e and 4kn with what Seagate calls Fast Format.

Going back to this topic of 512 vs 4k drives I've just come across a
curious anomaly that you might find of interest as it relates to OS
performance. fdisk says a certain drive's logical and physical
blocksizes are both 512 bytes whereas the kernel apparently thinks
the physical blocksize is 4k.
I found that writing 512-byte blocks to the drive (under Linix, with
dd) was very slow. A complete overwrite took four hours. By contrast,
writing it in 4k blocks took an hour and a half.
The reason, it seems from iostat, is that 4k blocks simply get
written (i.e. there's no read activity) whereas 512-byte blocks
require as much reading as writing, evidently if a program writes a
512-byte block to the drive the kernel will (if necessary) read a
bigger block before modifying it and writing it back.
I've no idea why but I think it must be the kernel which is making
this (bad) decision.
$ sudo fdisk /dev/sdh -l
Disk /dev/sdh: 58.2 GiB, 62461575168 bytes, 121995264 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

There has to be a setting somewhere. A while back, I tracked down the
minimum allocation to my Linux swap partition as being set for multiples
of 8 sectors, i.e., 4KB. The queue directory below should have a file
for the logical and physical blocks sizes for your device. Maybe,
there is some setting in syctrl.conf. I'd check Google for articles
about tweaking Linux disk throughput or I/O.

/sys/block/sdh/queue/*
/etc/sysctrl.conf

--

Scott Lurndal

2021-01-18 16:36:44 UTC

Post by James Harris
I found that writing 512-byte blocks to the drive (under Linix, with dd)
was very slow. A complete overwrite took four hours. By contrast,
writing it in 4k blocks took an hour and a half.

That's an 8 to 1 reduction in the number of physical I/O
requests when going from 512-byte to 4k. I would expect it (4k)
to run 8 times faster, regardless of the underlying hardware
sector size.

24 Replies
1 View
Permalink to this page
Disable enhanced parsing

Thread Navigation

James Harris 2020-03-29 10:01:46 UTC

JJ 2020-03-29 11:36:43 UTC

James Harris 2020-03-29 18:49:55 UTC

James Harris 2020-04-02 16:50:09 UTC

James Harris 2020-04-02 16:59:28 UTC

Scott Lurndal 2020-04-02 18:05:39 UTC

wolfgang kern 2020-04-03 08:56:25 UTC

James Harris 2020-04-03 17:19:29 UTC

Rod Pemberton 2020-08-01 23:08:07 UTC

James Harris 2020-08-03 12:56:07 UTC

Benjamin David Lunt 2020-04-23 03:38:17 UTC

James Harris 2020-04-23 13:16:09 UTC

Benjamin David Lunt 2020-04-24 01:23:16 UTC

T. Ment 2020-04-24 02:38:01 UTC

James Harris 2020-04-24 10:19:41 UTC

T. Ment 2020-04-24 16:14:35 UTC

Benjamin David Lunt 2020-04-25 04:07:03 UTC

James Harris 2020-04-25 09:16:14 UTC

James Harris 2020-08-01 08:21:33 UTC

James Harris 2021-01-16 08:50:52 UTC

wolfgang kern 2021-01-16 10:03:28 UTC

James Harris 2021-01-16 13:24:40 UTC

wolfgang kern 2021-01-17 07:31:41 UTC

Rod Pemberton 2021-01-18 06:22:16 UTC

Scott Lurndal 2021-01-18 16:36:44 UTC

about - legalese

Loading...