python


pandas parse csv with newlines


A while ago I used quotes on both sides of my data and read it into pandas pandas parse csv with left and right quote chars now, I also need to support newlines and some weird characters.
Minimum sample below, the first string (temp) will work just fine, but the second one won't parse properly.
import pandas as pd
import os
from pandas.compat import StringIO
temp=u"""<first>$$><$$<second>$$><$$<first>
<foo>$$><$$<bar>$$><$$<baz>"""
temp=u"""<first>$$><$$<second>$$><$$<third>
<foo>$$><$$<bar>$$><$$<baz>
<foo>$$><$$<Green; kkkk 101; aaaa, bbb; [foo<1>>aaa<123>>xxx<1>>zzz<1.17989207 | 18187681 | asdf |>>
;sdf{
}
;ADD{
]>$$><$$<baz>"""
big_df = pd.read_csv(StringIO(temp),
encoding='utf8',
sep='\$\$><\$\$',
decimal=',',
engine='python') # we cant use pandas optimized C parser due to our special delimiters.
big_df.iloc[:, -1] = big_df.iloc[:, -1].str.replace('\$\$>$', '')
big_df.iloc[:, -1] = big_df.iloc[:, -1].str.replace('\$\$>$', '')
big_df = big_df.replace(['^<', '>$'], ['', ''], regex=True)
big_df.columns = big_df.columns.to_series().replace(['^<', '>$', '>\$\$'], ['', '', ''], regex=True)
big_df
edit
As outlined in the comment - when putting all onto a single line it works just fine.
How could I automate this maybe via sed/Awk?
awk '{printf("%s ",$0)} END{print ""}' sample.csv will remove all new lines and concatenate everything into a single line. I would rather only want to remove the problematic newlines.
awk -F, 'NF < 4 {getline nextline; $0 = $0 nextline} 1' sample.csv will already remove the normal newlines. But still there are the additional blank lines.
So your "real" newlines are marked with $$>\n. Read your file in to string, replace $$>\n with something temporary, remove any remaining newlines, reinsert the "real" newlines, then pass to read_csv().
temp = temp.replace('$$>\n', '%%NEWLINE%%').replace('\n','').replace('%%NEWLINE%%', '\n')
big_df = pd.read_csv(StringIO(temp), ...)

Related Links

PySide application crashes when setting a new widget to QScrollArea
Python - Download File Created From ASPX Form Submission
Find number of weeks in a month
breaking up a list entry in several lists at every /n Python 3
How can I get the default colors in GTK?
Ndb entry .put() not executed?
Tensorflow: NaNs propagating throughout network, even though using sparse_softmax etc
How to do a Python argparse mutually required argument group
scaling websocket game application server
Is there a way to find a character's Unicode code point in Python 2.7?
figtext datetime function matplotlib
environment variable in ubuntu
Cross-platform Python Executables
Extract subarray from collection of 2D coordinates?
tkinter populate treeview using threading pool
How to make a function determining the winner of Tic-Tac-Toe more concise

Categories

HOME
makefile
udp
filterrific
sed
freepascal
google-sheets-api
travis-ci
chaiscript
flyway
rebol
sequelize.js
sudo
xcode8.3
google-spreadsheet-api
wicket
lagom
spinnaker
cs-cart
activesync
multiple-monitors
arm-template
pyephem
flat-file
android-fragmentactivity
raphael
azure-servicebus-queues
memsql
non-deterministic
python-textprocessing
sumo
skmaps
maximo
mef2
forum
bitcoin-testnet
superscript
jackson-dataformat-csv
xor
pyscripter
hotmail
body-parser
modulo
onbackpressed
exiftool
bayesian-networks
sendgrid-templates
tripwire
microdata
aqgridview
nsjsonserialization
uiviewpropertyanimator
openh264
google-maps-ios
volume
instruction-set
pycurl
lotus
gradle-script-kotlin
try-finally
radians
freetype2
linkageerror
asteriskami
soda
console-redirect
brackets-shell
galaxy
affix
rad
uialertview
oai
built-in
team-build
isml
adobe-indesign
chaining
activity-streams
sortable
kraken.js
db4o
modeshape
sql-server-2012-web
localtime
shiva3d
dropdownlistfor
maven-ear-plugin
vim-powerline
hibernate3
gjs
hashalgorithm
printing-web-page
qtembedded
windows-controls
project-lifecycle

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App