split Command: Splitting and Joining Large Files
What If a File Is Too Big?
split command. It cuts a big file into small pieces that you can join back together later, exactly as they were. Let's walk through it.What You'll Learn
- How to split a large file by size, lines, or piece count with
split - How to join the pieces back into the original file with
cat - How to use numbered suffixes (
part_01instead ofxaa) - How to verify the file is intact after splitting and joining
1. What Is the split Command?
Conclusion: split breaks one file into several smaller files; concatenating them with cat restores the original byte-for-byte.
split only cuts a copy into pieces; the original stays untouched. And when you join the pieces in order, you get the original back without losing a single byte.First, create a file to practice with.
# Create a 50MB dummy file $ dd if=/dev/zero of=bigfile.dat bs=1M count=50
$ ls -lh bigfile.dat
-rw-r--r-- 1 user user 50M Jun 5 10:00 bigfile.dat
2. How to Split by Size
Conclusion: Use
split -b SIZE file prefix.-b 100Mmakes 100MB pieces,-b 10Mmakes 10MB pieces.
-b (for bytes). The part_ at the end is the prefix added to the front of each output file name.$ split -b 10M bigfile.dat part_
$ ls -lh part_*
-rw-r--r-- 1 user user 10M Jun 5 10:01 part_aa -rw-r--r-- 1 user user 10M Jun 5 10:01 part_ab -rw-r--r-- 1 user user 10M Jun 5 10:01 part_ac -rw-r--r-- 1 user user 10M Jun 5 10:01 part_ad -rw-r--r-- 1 user user 10M Jun 5 10:01 part_ae
part_aa, part_ab... with letters increasing.xaa, xab... The size units are K, M, G. Note that 10M means 10x1024x1024 bytes, while 10MB means 10x1000x1000 bytes.Handy size guide
split -b 700M-> fits on one CDsplit -b 100M-> easy cloud-upload sizesplit -b 1G-> 1GB per piece
3. How to Split by Line Count
Conclusion: For text and logs,
split -l LINES file prefixsplits on line boundaries so no line is ever cut in half.
-l (for lines). Splitting by size can cut a line right in the middle, but -l always breaks at a line boundary. Much safer for CSV and logs.$ split -l 1000 access.log chunk_
$ wc -l chunk_*
1000 chunk_aa
1000 chunk_ab
342 chunk_ac
2342 total
Size splitting (-b) cuts mechanically at a byte offset, so in a text file a line may be split across two pieces. When line meaning matters, always use -l.
4. How to Split into a Fixed Number of Pieces
Conclusion:
split -n COUNT file prefixdivides the file into exactly that many equal pieces.
-n (for number). It divides the whole file into 5 equal parts, so you don't have to calculate sizes.$ split -n 5 bigfile.dat group_
$ ls -lh group_*
-rw-r--r-- 1 user user 10M Jun 5 10:05 group_aa -rw-r--r-- 1 user user 10M Jun 5 10:05 group_ab -rw-r--r-- 1 user user 10M Jun 5 10:05 group_ac -rw-r--r-- 1 user user 10M Jun 5 10:05 group_ad -rw-r--r-- 1 user user 10M Jun 5 10:05 group_ae
5. How to Join the Pieces Back
Conclusion: No special command is needed.
cat prefix* > restored_fileconcatenates the pieces in order to rebuild the original.
join command, but that's for joining table columns - completely different. To reassemble split pieces, you use cat.cat command that displays files?cat also concatenates multiple files in order. Redirect with > to write the result to a file, and you're done.$ cat part_* > restored.dat
$ ls -lh restored.dat
-rw-r--r-- 1 user user 50M Jun 5 10:10 restored.dat
Watch the order. The * wildcard in cat part_* expands in alphabetical order, so part_aa -> part_ab -> ... stays correct. But if you name files with plain numbers like part_1, part_2, ... part_10, then part_10 may sort before part_2. Use the zero-padded numbering in the next section to stay safe.
6. How to Use Numbered Suffixes
Conclusion:
-dgives numeric suffixes (00,01...),-asets the digit count, and--additional-suffixadds an extension.
01, 02 than aa, ab - it's clearer.-d (for digits) to get numbers. Set the width with -a, and you can even add an extension like .part with --additional-suffix.$ split -b 10M -d -a 2 --additional-suffix=.part bigfile.dat backup_
$ ls backup_*
backup_00.part backup_01.part backup_02.part backup_03.part backup_04.part
With zero-padded numbers (00, 01, ... 10, 11), cat backup_*.part > restored.dat always joins in the correct order. If you expect more than 100 pieces, use -a 3 for three digits.
7. How to Verify the File Is Intact
Conclusion: Compare
sha256sumhashes before and after. Matching values prove the file was restored byte-for-byte.
sha256sum is for. It's a kind of "fingerprint" computed from the file's contents. If the original and restored files have the same fingerprint, they're identical. It also catches corruption during transfer or copy.$ sha256sum bigfile.dat restored.dat
e3b0c44298fc1c149afbf4c8996fb924... bigfile.dat e3b0c44298fc1c149afbf4c8996fb924... restored.dat
Mini Exercise (click to open)
Create a 30MB file called practice.dat, then (1) split it into 7MB pieces, (2) join them with cat, and (3) confirm the hash matches the original.
Hint: dd if=/dev/zero of=practice.dat bs=1M count=30 -> split -b 7M practice.dat p_ -> cat p_* > joined.dat -> sha256sum practice.dat joined.dat
8. Common Pitfalls and Fixes
Conclusion: Most trouble comes from join order, unit confusion, or running out of disk space. Check capacity and units before you split.
| Symptom | Cause | Fix |
|---|---|---|
| Joined file is corrupted | Wrong join order | Use zero-padded -d and cat ...* |
| More/fewer pieces than expected | M (1024) vs MB (1000) |
Use one unit consistently |
No space left on device |
Splitting needs ~2x the space | Check free space with df -h first |
| Text lines cut in half | You used -b (bytes) |
Re-split with -l (lines) |
Don't do this
- Deleting the original before testing the join
- Skipping the hash verification
- Starting a split without checking free space