file Command: Detecting File Types
What You'll Learn
- How the
filecommand judges a file's type by its content, not its extension - Why
filesees through a renamed extension - How to read a MIME type with
file -i - Handy options such as
-band compressed-file handling
Quick Summary
file NAMEtells you what a file really is in one shot- It decides based on the leading bytes (the magic number), not the extension
- Rename a JPEG to
.txtandfilestill reports it as a JPEG
Assumptions
- OS: Ubuntu / typical Linux
fileis preinstalled (its ownfilepackage, separate from coreutils)
1. What Is the file Command, and How Is It Different from Extensions?
Conclusion: file reads a file's content to determine its type, judging by the real bytes rather than the self-declared extension.
photo.jpg, to know the type?file command reads the content itself to decide.file reads it.file report.pdf
report.pdf: PDF document, version 1.7
file relies on a pattern database (libmagic) compiled into magic.mgc (source in /usr/share/misc/magic). Thousands of format signatures are registered there.
2. Why Can't You Trust the Extension?
Conclusion: An extension is just part of the name and guarantees nothing. Because file judges by content, it is not fooled by a faked extension.
.txt, or a file from Windows arrives with no extension at all. In those cases you want to confirm the real content..txt.mv penguin.jpg penguin.txt file penguin.txt
penguin.txt: JPEG image data, JFIF standard 1.01, resolution (DPI), 72x72
.txt, but it still saw it was a JPEG!3. How Do You Use the file Command?
Conclusion: file NAME is the basic form. Pass several names or a wildcard to check many files at once.
* to check them all together.file notes.txt file *
notes.txt: ASCII text archive.tar.gz: gzip compressed data, original size modulo 2^32 10240 script.sh: Bourne-Again shell script, ASCII text executable image.png: PNG image data, 800 x 600, 8-bit/color RGBA, non-interlaced
Directories, symbolic links, and empty files are reported as directory, symbolic link to ..., and empty respectively.
4. How Do You Get the MIME Type? (-i)
Conclusion: file -i returns a MIME type such as
text/plain; charset=utf-8, which is convenient for scripts.
text/plain notation I sometimes see — can file print it?-i (--mime) and you get the MIME-type form, which is easy for programs to parse. The character set (charset) is shown too.file -i notes.txt file -i report.pdf
notes.txt: text/plain; charset=utf-8 report.pdf: application/pdf; charset=binary
When you want only the type or only the encoding, use --mime-type / --mime-encoding.
file --mime-type photo.png
photo.png: image/png
5. Which Options Are Worth Knowing?
Conclusion: Knowing -b (omit name), -L (follow links), -z (look inside archives), and -s (device files) is enough for most work.
| Option | Meaning |
|---|---|
-b / --brief |
Print the type only, without the filename |
-i / --mime |
Output in MIME-type form |
-L |
Follow a symbolic link and judge its target |
-z |
Look inside a compressed file to judge it |
-s |
Read block / character device files |
-k |
Keep going instead of stopping at the first match |
-b useful?-b is easier.file -b script.sh
Bourne-Again shell script, ASCII text executable
With -z you can confirm "what kind of compression plus the inner file's type" without unpacking a .gz.
file -z logs.tar.gz
6. Where Is It Used in Real Work?
Conclusion: Confirming a downloaded file's identity, telling text from binary, and checking an archive before extracting are common uses.
curl is really what you wanted. A single file check tells you whether an HTML error page came down instead.curl -sL https://example.com/app.tar.gz -o app.tar.gz file app.tar.gz
app.tar.gz: gzip compressed data, ...
If it says HTML document here, you downloaded an error page, not an archive. Catching that before extraction is the value of file.
7. What Are the Common Pitfalls?
Conclusion: file makes an estimate, not a guarantee. Forget -L for symlinks or -s for device files and you get unexpected results.
The output of file is an estimate. Custom formats without a magic number, or files too short to sample, may be reported vaguely as data or ASCII text. For critical decisions, combine it with other tools (stat, checksums, and so on).
file on a symbolic link, it only showed info about the link itself.-L.file -L /usr/bin/python3
8. Mini Exercise
Conclusion: Hands-on practice makes it stick. Confirm for yourself that file sees through a changed extension.
Try It
- Create some text:
echo "hello" > sample.txt - Check the type:
file sample.txt - Rename it:
mv sample.txt sample.bin - Check again:
file sample.bin— does the result change when the extension does?
Sample Answer
It prints sample.bin: ASCII text. The content has not changed, so the verdict (ASCII text) does not change either — proof that file does not depend on the extension.