file Command: Detecting File Types

file Command: Detecting File Types

What You'll Learn

  • How the file command judges a file's type by its content, not its extension
  • Why file sees through a renamed extension
  • How to read a MIME type with file -i
  • Handy options such as -b and compressed-file handling

Quick Summary

  • file NAME tells you what a file really is in one shot
  • It decides based on the leading bytes (the magic number), not the extension
  • Rename a JPEG to .txt and file still reports it as a JPEG

Assumptions

  • OS: Ubuntu / typical Linux
  • file is preinstalled (its own file package, separate from coreutils)

1. What Is the file Command, and How Is It Different from Extensions?

Conclusion: file reads a file's content to determine its type, judging by the real bytes rather than the self-declared extension.

Lina: Can't I just read the extension, like photo.jpg, to know the type?
Linny-senpai: Usually, yes. But an extension is "self-declared." Rename the file and the name can disagree with the content. The file command reads the content itself to decide.
Lina: Read the content? Don't I have to open the file to see what's inside?
Linny-senpai: Most formats put a marker in their first few bytes that says "this is what I am." That marker is called a magic number, and file reads it.
file report.pdf
report.pdf: PDF document, version 1.7

file relies on a pattern database (libmagic) compiled into magic.mgc (source in /usr/share/misc/magic). Thousands of format signatures are registered there.

2. Why Can't You Trust the Extension?

Conclusion: An extension is just part of the name and guarantees nothing. Because file judges by content, it is not fooled by a faked extension.

Lina: When does an unreliable extension actually cause trouble?
Linny-senpai: Say you accidentally rename an image to .txt, or a file from Windows arrives with no extension at all. In those cases you want to confirm the real content.
Lina: Let me try it. I'll rename an image to .txt.
mv penguin.jpg penguin.txt
file penguin.txt
penguin.txt: JPEG image data, JFIF standard 1.01, resolution (DPI), 72x72
Lina: The name says .txt, but it still saw it was a JPEG!
Linny-senpai: Right. It looks at content, not the name. The same is true for files with no extension at all.

3. How Do You Use the file Command?

Conclusion: file NAME is the basic form. Pass several names or a wildcard to check many files at once.

Lina: Show me the basic usage.
Linny-senpai: Just hand it a filename. You can list several, or use * to check them all together.
file notes.txt
file *
notes.txt: ASCII text
archive.tar.gz: gzip compressed data, original size modulo 2^32 10240
script.sh:  Bourne-Again shell script, ASCII text executable
image.png:  PNG image data, 800 x 600, 8-bit/color RGBA, non-interlaced

Directories, symbolic links, and empty files are reported as directory, symbolic link to ..., and empty respectively.

4. How Do You Get the MIME Type? (-i)

Conclusion: file -i returns a MIME type such as text/plain; charset=utf-8, which is convenient for scripts.

Lina: That text/plain notation I sometimes see — can file print it?
Linny-senpai: Add -i (--mime) and you get the MIME-type form, which is easy for programs to parse. The character set (charset) is shown too.
file -i notes.txt
file -i report.pdf
notes.txt: text/plain; charset=utf-8
report.pdf: application/pdf; charset=binary

When you want only the type or only the encoding, use --mime-type / --mime-encoding.

file --mime-type photo.png
photo.png: image/png

5. Which Options Are Worth Knowing?

Conclusion: Knowing -b (omit name), -L (follow links), -z (look inside archives), and -s (device files) is enough for most work.

Option Meaning
-b / --brief Print the type only, without the filename
-i / --mime Output in MIME-type form
-L Follow a symbolic link and judge its target
-z Look inside a compressed file to judge it
-s Read block / character device files
-k Keep going instead of stopping at the first match
Lina: When is -b useful?
Linny-senpai: When a script consumes the output. The leading filename gets in the way, so if you only want the type, -b is easier.
file -b script.sh
Bourne-Again shell script, ASCII text executable

With -z you can confirm "what kind of compression plus the inner file's type" without unpacking a .gz.

file -z logs.tar.gz

6. Where Is It Used in Real Work?

Conclusion: Confirming a downloaded file's identity, telling text from binary, and checking an archive before extracting are common uses.

Lina: Where would I actually reach for this on the job?
Linny-senpai: For example, to confirm a file you fetched with curl is really what you wanted. A single file check tells you whether an HTML error page came down instead.
curl -sL https://example.com/app.tar.gz -o app.tar.gz
file app.tar.gz
app.tar.gz: gzip compressed data, ...

If it says HTML document here, you downloaded an error page, not an archive. Catching that before extraction is the value of file.

7. What Are the Common Pitfalls?

Conclusion: file makes an estimate, not a guarantee. Forget -L for symlinks or -s for device files and you get unexpected results.

Lina: When I ran file on a symbolic link, it only showed info about the link itself.
Linny-senpai: By default it inspects the link itself. To judge the target's real content, add -L.
file -L /usr/bin/python3

8. Mini Exercise

Conclusion: Hands-on practice makes it stick. Confirm for yourself that file sees through a changed extension.

Try It

  1. Create some text: echo "hello" > sample.txt
  2. Check the type: file sample.txt
  3. Rename it: mv sample.txt sample.bin
  4. Check again: file sample.bin — does the result change when the extension does?
Sample Answer

It prints sample.bin: ASCII text. The content has not changed, so the verdict (ASCII text) does not change either — proof that file does not depend on the extension.

Next Reading